Oh Dear is the all-in-one monitoring tool for your entire website. We monitor uptime, SSL certificates, broken links, scheduled tasks and more. You'll get a notifications for us when something's wrong. All that paired with a developer friendly API and kick-ass documentation. O, and you'll also be able to create a public status page under a minute. Start monitoring using our free trial now.

Breaking Laravel's firstOrCreate using race conditions

Original – by Freek Van der Herten – 4 minute read

Recently I was working on a client project where a data import was performed via queues. Each record was imported in its own queued job using multiple queue workers. After the data import was done we had more rows than expected in the database. In this blogpost I'd like to explain why that happened.

Inside the queued job that imports a single record, firstOrCreate was being used. If we look at the code of this native Laravel function we see that two queries are performed.

/**
 * Get the first record matching the attributes or create it.
 *
 * @param  array  $attributes
 * @param  array  $values
 * @return \Illuminate\Database\Eloquent\Model
 */
public function firstOrCreate(array $attributes, array $values = [])
{
    if (! is_null($instance = $this->where($attributes)->first())) {
        return $instance;
    }
    return tap($this->newModelInstance($attributes + $values), function ($instance) {
        $instance->save();
    });
}

First, a select query with a where condition is performed to test if the record exists or not. If it does, the model instance of the record is returned. If it doesn’t, we create a record.

If you don't use this in a queue, and there isn't too much traffic on your app, this works perfectly. But in a queued job, or under heavy load, there’s a chance that this can go wrong. If job A and B are executed simultaneously there’s a chance that both jobs will execute the first select query of firstOrCreate at exactly the same time. Both jobs will conclude the record does not exist yet. Both jobs will create the same record and you'll end up with duplicates.

You might think that this is an edge case. It turns out that it's really not that uncommon for this problem to occur. I created a demo application that demonstrates that you can run into this problem quite quickly. The app has the following simple command to dispatch a few jobs that will create products with the same uuid.

namespace App\Console\Commands;

use App\Jobs\CreateProductJob;
use Illuminate\Console\Command;
use Ramsey\Uuid\Uuid;
class CreateProductsCommand extends Command
{
    protected $signature = 'create-products';
		
    protected $description = 'Create some products';
		
    public function handle()
    {
        $uuid = (string) Uuid::uuid4();
        foreach(range(1, 5) as $i) {
            dispatch(new CreateProductJob($uuid));
        }
    }
}

The CreateProductJob looks like this:

namespace App\Jobs;

use App\Product;
use Illuminate\Bus\Queueable;
use Illuminate\Queue\SerializesModels;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
class CreateProductJob implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
		
    /** @var string */
    public $uuid;
		
    public function __construct(string $uuid)
    {
        $this->uuid = $uuid;
    }
    public function handle()
    {
        Product::firstOrCreate(['uuid' => $this->uuid]);
    }
}

The app uses a Redis queue and Horizon by default. Horizon is configured to use 3 processes and thus dispatch 3 jobs concurrently. If you fire that command you'll see that, even with 5 jobs queued we'll already end up will duplicate records. Here's the demo in action:

I know the example is a bit contrived. I'd like to emphase that this problem is not caused by Redis, Horizon or Laravel. It happens because things are executed concurrently. In a real world project you should and will probably put a unique index on the uuid column to prevent your database from storing any duplicates. In this post I merely wanted to remind you that it doesn't take much to let firstOrCreate produce unexpected results if multiple jobs are passing the same arguments to it.

To solve this problem there are a couple of solutions. You could set the amount of processes of the queue to 1 so multiple jobs won't be executed concurrently. If you go that route I'd recommend scheduling the jobs on another queue, so the default queue can still use multiple processes. You could also solve it on the database level. There's an INSERT INTO .. ON DUPLICATE KEY that you can leverage. You'll find an example in this gist on GitHub.

Many thanks to my colleague Alex who helped identify this problem in our client project and reviewed this post. Do you have another strategy to avoid this problem? Let me know in the comments below.

Stay up to date with all things Laravel, PHP, and JavaScript.

You can follow me on these platforms:

On all these platforms, regularly share programming tips, and what I myself have learned in ongoing projects.

Every month I send out a newsletter containing lots of interesting stuff for the modern PHP developer.

Expect quick tips & tricks, interesting tutorials, opinions and packages. Because I work with Laravel every day there is an emphasis on that framework.

Rest assured that I will only use your email address to send you the newsletter and will not use it for any other purposes.

Comments

Lloric Mayuga Garcia avatar
Lloric Mayuga Garcia

it is possible to add test for it?

mpyw avatar

This issue has been resolved since Laravel 10.22.0.

https://zenn.dev/mpyw/articles/laravel-v10-create-or-first

Please read it through translation.

Comments powered by Laravel Comments
Want to join the conversation? Log in or create an account to post a comment.