In an application I’m working on, users are represented with initials. In order to generate seed data I used Faker to fill their information but I struggled to generate unique initials from their first & last name.
Here’s my initial User Factory.
$factory->define(User::class, function (Faker $faker) { $first_name = $faker->firstName; $last_name = $faker->lastName; return [ 'initials' => $first_name[0] . $last_name[0], 'first_name' => $first_name, 'last_name' => $last_name, 'email' => $faker->unique()->safeEmail, 'password' => '$2y$10$92IXUNpkjO0rOQ5byMi.Ye4oKoEa3Ro9llC/.og/at2.uheWG/igi', // password 'remember_token' => Str::random(10), ]; });
You can see how I implemented this at first. Of course this works great, but I ran into some issues that might be obvious to you.
Initials must be unique
My first issue occurred when multiple users shared the same initials. The seeder will throw a QueryException
if you set your initials
column as unique
.
Here was my first attempt at solving this issue.
$factory->define(User::class, function (Faker $faker) { $first_name = $faker->firstName; $last_name = $faker->lastName; $suffix = 0; do { $initials = $first_name[0]; for ($i = 0; $i <= $suffix; $i++) { $initials .= $last_name[$i]; } $suffix++; } while (User::where('initials', $initials)->count() === 1); return [ 'initials' => $initials, 'first_name' => $first_name, 'last_name' => $last_name, 'email' => $faker->unique()->safeEmail, 'password' => '$2y$10$92IXUNpkjO0rOQ5byMi.Ye4oKoEa3Ro9llC/.og/at2.uheWG/igi', // password 'remember_token' => Str::random(10), ]; });
I thought this would fix it, but I didn’t account the fact that the seeder only creates the models after generating them.
This means that my query User::where('initials', $initials)->count()
will not take the newly generated models into account.
How to generate unique fields with Model Factory?
If the field you want to be unique is generated from Faker’s API, you can use the unique()
method on it. You can see I already use that in the email
field.
But the initials are not generated by Faker’s API in my case, so I have to do something else.
In fact it’s possible to share a variable throughout the factory loop (if you create several models at once). You have to use a static variable for this. Let’s see how we can use this static variable to solve our issue.
$factory->define(User::class, function (Faker $faker) { static $existingInitials; $existingInitials = $existingInitials ?: []; $first_name = $faker->firstName; $last_name = $faker->lastName; $suffix = 0; do { $initials = $first_name[0]; for ($i = 0; $i <= $suffix; $i++) { $initials .= $last_name[$i]; } $suffix++; } while (in_array($initials, $existingInitials)); $existingInitials[] = $initials; return [ 'initials' => $initials, 'first_name' => $first_name, 'last_name' => $last_name, 'email' => $faker->unique()->safeEmail, 'password' => '$2y$10$92IXUNpkjO0rOQ5byMi.Ye4oKoEa3Ro9llC/.og/at2.uheWG/igi', // password 'remember_token' => Str::random(10), ]; });
Great! It works!
What about special characters?
Soon enough after this I ran into another issue when I started seeding more users. Indeed, the initials length will be bigger to make them unique. And while most names don’t have special characters at the first position, it happens more often at the second position.
Illuminate\Database\QueryException : SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xC3' for column 'initials' at row 1 (SQL: insert into `users` (`initials`, `first_name`, `last_name`, `email`, `password`, `remember_token`, `updated_at`, `created_at`) values (JL�, Jade, Léonard, [email protected], $2y$10$92IXUNpkjO0rOQ5byMi.Ye4oKoEa3Ro9llC/.og/at2.uheWG/igi, 65Yeja16Fg, 2019-07-18 11:54:13, 2019-07-18 11:54:13))
The query fails because \xC3
is not a valid string value. It actually represents the é
character in Léonard
.
What I want in this case i to transform the é
into an e
.
First instinct was to search on StackOverflow. They share some solutions, but they seemed to be quite complicated. I tried the iconv
function but it didn’t work like I wanted (converted é
into 'e
. Other solutions seemed to require an array of replacements for special characters which seemed a bit too much for my case.
Finally I looked into the Laravel Helpers. At first there seemed to not have any helper for this, but when I looked into the Str::slug()
helper, I noticed it used another helper called Str::ascii()
which is exactly like what I needed.
Let’s put this into the factory.
use Illuminate\Support\Str; $factory->define(User::class, function (Faker $faker) { static $existingInitials; $existingInitials = $existingInitials ?: []; $first_name = $faker->firstName; $last_name = $faker->lastNiName; $suffix = 0; do { $initials = Str::ascii($first_name)[0]; for ($i = 0; $i <= $suffix; $i++) { $initials .= Str::ascii($last_name)[$i]; } $suffix++; } while (in_array($initials, $existingInitials)); $existingInitials[] = $initials; return [ 'initials' => $initials, 'first_name' => $first_name, 'last_name' => $last_name, 'email' => $faker->unique()->safeEmail, 'password' => '$2y$10$92IXUNpkjO0rOQ5byMi.Ye4oKoEa3Ro9llC/.og/at2.uheWG/igi', // password 'remember_token' => Str::random(10), ]; });
Do note that we are transforming all the last_name
and not only the character because when the last name is Léonard
, $last_name[1]
will return a special character (Ã
).
Nice! I now have unique initials without special characters.
Make sure initials are uppercase
Of course initials must be uppercase. I forgot this, so let’s quickly change the code.
do { $initials = strtoupper(Str::ascii($first_name)[0]); for ($i = 0; $i <= $suffix; $i++) { $initials .= strtoupper(Str::ascii($last_name)[$i]); } $suffix++; } while (in_array($initials, $existingInitials));
Illuminate\Database\QueryException Duplicate entry
What?! How?! Why?!
I seeded more users at once and I came up with a new QueryException Duplicate entry
on the initials.
It appears that one of the users had the name Arne De Pauw
. Another user already had the initials ADE
so the script tried to generate the initials with an additional character: ADE
(with a space).
If you try to compare ADE
with ADE
in MySQL it’ll return the same results, so the space character is not taken into account.
Let’s fix this.
$character = strtoupper(Str::ascii($last_name)[$i]); if (in_array($character, [' ', '-', '\'', ''])) { continue; } $initials .= strtoupper($character);
What if I already have users in my DB?
Good question. Basically we can easily implement this by setting the initial value of $initials
to the current list of initials.
$existingInitials = $existingInitials ?: User::pluck('initials')->toArray();
One more thing…
Before we finish this, you may run into a final error when seeding a large number of users.
ErrorException : Uninitialized string offset: 3
This error will be thrown if the last name is not long enough to make a unique set of initials.
In my case it was an issue with a short last name of 5 letters. If you have a lot of users, the length of initials will start to grow and reach 5 or 6 letters.
A simple solution I came with is that in case the last name is not big enough, I’ll just append X
initials.
$safeLastName = Str::ascii($last_name); $initials = strtoupper(Str::ascii($first_name)[0]); for ($i = 0; $i <= $suffix; $i++) { if (! isset($safeLastName[$i])) { $initials .= 'X'; continue; } $character = $safeLastName[$i]; if (in_array($character, [' ', '-', '\'', ''])) { continue; } $initials .= strtoupper($character); }
Conclusion
We now have a pretty good factory that can generate unique initials.
It can be improved, here are some ideas :
- When names have multiple parts, use the first character of each part (
Jean-Claude Van Damme
would becomeJCVD
) - Use a number instead of
X
when initials already exist so the initials are shorter - Extract the logic into an helper class so you can use it anywhere in your project
The code of this factory is available on Github: https://gist.github.com/depsimon/d26d2809b9269fbe0f7f82b9d2ae2fc6