php language basics
Project: Data-Cleaning Script
A data-cleaning script takes messy input and turns it into a predictable shape. In real PHP work, this comes up when importing CSV files, processing form exports, tidying old database records, preparing data for an API, or fixing inconsistent values before a migration.
This mini-project keeps the input inside the script so you can focus on the language work. The important skills are trimming strings, normalising case, validating required values, looping through rows, returning clean arrays, and reporting what changed.
Start with messy rows
Raw data is rarely perfect. A customer export might contain extra spaces, mixed casing, empty names, and email addresses with capital letters.
<?php
declare(strict_types=1);
$rows = [
['name' => ' ada lovelace ', 'email' => ' ADA@EXAMPLE.COM '],
['name' => 'grace HOPPER', 'email' => 'grace@example.com'],
['name' => ' ', 'email' => 'missing@example.com'],
];
Each row is an associative array. The script should clean the good rows and skip the row that does not have a usable name.
Clean one value first
Do not start by writing a large loop. Write a small function that cleans one piece of data.
<?php
declare(strict_types=1);
function cleanName(string $name): string
{
$name = trim($name);
$name = strtolower($name);
return ucwords($name);
}
echo cleanName(' ada lovelace ');
// Prints:
// Ada Lovelace
trim() removes whitespace from the beginning and end. strtolower() gives you a predictable base. ucwords() then capitalises each word.
Clean email addresses differently
Not all fields use the same rule. Names might be title-cased, but email addresses are usually lower-cased and trimmed.
<?php
declare(strict_types=1);
function cleanEmail(string $email): string
{
return strtolower(trim($email));
}
echo cleanEmail(' ADA@EXAMPLE.COM ');
// Prints:
// ada@example.com
Small named functions make these rules easy to review. A future developer can change name cleaning without accidentally changing email cleaning.
Decide what counts as valid
Cleaning is not the same as accepting everything. If a required value becomes empty after trimming, the row should not be treated as valid.
<?php
declare(strict_types=1);
function hasRequiredName(array $row): bool
{
return trim($row['name'] ?? '') !== '';
}
var_dump(hasRequiredName(['name' => ' ']));
// Prints:
// bool(false)
The ?? operator protects the code when a key is missing. A missing name and a blank name both fail the check.
Return cleaned rows
The main cleaning function should accept a list of rows and return a new list. That keeps the function easy to use and avoids surprising changes to the original data.
<?php
declare(strict_types=1);
function cleanName(string $name): string
{
return ucwords(strtolower(trim($name)));
}
function cleanEmail(string $email): string
{
return strtolower(trim($email));
}
function cleanCustomerRows(array $rows): array
{
$cleaned = [];
foreach ($rows as $row) {
if (trim($row['name'] ?? '') === '') {
continue;
}
$cleaned[] = [
'name' => cleanName($row['name']),
'email' => cleanEmail($row['email'] ?? ''),
];
}
return $cleaned;
}
$rows = [
['name' => ' ada lovelace ', 'email' => ' ADA@EXAMPLE.COM '],
['name' => ' ', 'email' => 'missing@example.com'],
];
print_r(cleanCustomerRows($rows));
// Prints:
// Array
// (
// [0] => Array
// (
// [name] => Ada Lovelace
// [email] => ada@example.com
// )
// )
continue skips invalid rows. The script does not crash just because one row in the import is unusable.
Count what happened
A practical cleaning script should report what it did. Counts make the result easier to check.
<?php
declare(strict_types=1);
$totalRows = 3;
$cleanedRows = 2;
$skippedRows = $totalRows - $cleanedRows;
echo 'Read: ' . $totalRows . PHP_EOL;
echo 'Cleaned: ' . $cleanedRows . PHP_EOL;
echo 'Skipped: ' . $skippedRows . PHP_EOL;
// Prints:
// Read: 3
// Cleaned: 2
// Skipped: 1
Counts are useful in command-line scripts because they make manual verification quick. If a file has 500 rows and the script says it cleaned 40, you know something is wrong.
Keep validation clear
For a beginner project, skipping invalid rows is acceptable if the script reports the skip. In more serious imports, you might collect row numbers and error messages.
<?php
declare(strict_types=1);
function validateCustomerRow(array $row): ?string
{
if (trim($row['name'] ?? '') === '') {
return 'Name is required.';
}
if (trim($row['email'] ?? '') === '') {
return 'Email is required.';
}
return null;
}
echo validateCustomerRow(['name' => '', 'email' => 'a@example.com']);
// Prints:
// Name is required.
Returning null for "no error" is a common simple pattern. Later, larger applications may use exceptions, validation objects, or framework validators.
What a good solution includes
A good data-cleaning script should have small functions for each cleaning rule, a loop that handles every row, clear validation for required values, and a final report showing how many rows were read, cleaned, and skipped.
Before moving on, make sure you can explain why trim(), case normalisation, missing-key checks, and skipped-row counts matter in a real import or maintenance script.
Practice
Task: Clean Customer Names
Write a small PHP script that cleans customer names by trimming whitespace and normalising the casing.
Requirements
- Use
declare(strict_types=1);. - Create a
cleanName()function. - Trim leading and trailing whitespace.
- Convert the name to lower case before capitalising words.
- Print the cleaned names.
Use this data:
<?php
declare(strict_types=1);
$names = [
' ada lovelace ',
'grace HOPPER',
' alan TURING',
];
// Your output should be:
// Ada Lovelace
// Grace Hopper
// Alan Turing
Show solution
<?php
declare(strict_types=1);
function cleanName(string $name): string
{
return ucwords(strtolower(trim($name)));
}
$names = [
' ada lovelace ',
'grace HOPPER',
' alan TURING',
];
foreach ($names as $name) {
echo cleanName($name) . PHP_EOL;
}
// Prints:
// Ada Lovelace
// Grace Hopper
// Alan Turing
The function performs the cleaning in a predictable order: trim first, normalise to lower case, then capitalise each word.
Task: Predict Cleaning Counts
Read the script and predict the output before running it.
<?php
declare(strict_types=1);
function hasRequiredName(array $row): bool
{
return trim($row['name'] ?? '') !== '';
}
$rows = [
['name' => ' Ada ', 'email' => 'ada@example.com'],
['name' => ' ', 'email' => 'blank@example.com'],
['email' => 'missing-name@example.com'],
['name' => 'Grace', 'email' => 'grace@example.com'],
];
$cleaned = 0;
$skipped = 0;
foreach ($rows as $row) {
if (! hasRequiredName($row)) {
$skipped++;
continue;
}
$cleaned++;
}
echo 'Cleaned: ' . $cleaned . PHP_EOL;
echo 'Skipped: ' . $skipped . PHP_EOL;
// What does this print?
After predicting the output, explain why the row without a name key is skipped.
Show solution
<?php
declare(strict_types=1);
function hasRequiredName(array $row): bool
{
return trim($row['name'] ?? '') !== '';
}
$rows = [
['name' => ' Ada ', 'email' => 'ada@example.com'],
['name' => ' ', 'email' => 'blank@example.com'],
['email' => 'missing-name@example.com'],
['name' => 'Grace', 'email' => 'grace@example.com'],
];
$cleaned = 0;
$skipped = 0;
foreach ($rows as $row) {
if (! hasRequiredName($row)) {
$skipped++;
continue;
}
$cleaned++;
}
echo 'Cleaned: ' . $cleaned . PHP_EOL;
echo 'Skipped: ' . $skipped . PHP_EOL;
// Prints:
// Cleaned: 2
// Skipped: 2
The row with spaces is skipped because trimming leaves an empty string. The row without a name key is also skipped because $row['name'] ?? '' falls back to an empty string.
Task: Return Cleaned Name List
Write a cleanCustomerRows() function that returns a new list of cleaned customer rows.
Requirements
- Use
declare(strict_types=1);. - Trim and title-case each valid customer name.
- Trim and lower-case each email address.
- Skip rows with an empty or missing name.
- Return the cleaned rows instead of printing inside the cleaning function.
- Print each cleaned customer after the function returns.
Use this data:
<?php
declare(strict_types=1);
$rows = [
['name' => ' ada lovelace ', 'email' => ' ADA@EXAMPLE.COM '],
['name' => ' ', 'email' => 'blank@example.com'],
['name' => 'grace HOPPER', 'email' => 'GRACE@EXAMPLE.COM'],
];
// Your output should be:
// Ada Lovelace <ada@example.com>
// Grace Hopper <grace@example.com>
Show solution
<?php
declare(strict_types=1);
function cleanName(string $name): string
{
return ucwords(strtolower(trim($name)));
}
function cleanEmail(string $email): string
{
return strtolower(trim($email));
}
function cleanCustomerRows(array $rows): array
{
$cleaned = [];
foreach ($rows as $row) {
if (trim($row['name'] ?? '') === '') {
continue;
}
$cleaned[] = [
'name' => cleanName($row['name']),
'email' => cleanEmail($row['email'] ?? ''),
];
}
return $cleaned;
}
$rows = [
['name' => ' ada lovelace ', 'email' => ' ADA@EXAMPLE.COM '],
['name' => ' ', 'email' => 'blank@example.com'],
['name' => 'grace HOPPER', 'email' => 'GRACE@EXAMPLE.COM'],
];
$customers = cleanCustomerRows($rows);
foreach ($customers as $customer) {
echo $customer['name'] . ' <' . $customer['email'] . '>' . PHP_EOL;
}
// Prints:
// Ada Lovelace <ada@example.com>
// Grace Hopper <grace@example.com>
cleanCustomerRows() returns data instead of printing it. That makes the function reusable in a CLI script, an import job, or a later test.