php language basics

Project: Data-Cleaning Script

A data-cleaning script takes messy input and turns it into a predictable shape. In real PHP work, this comes up when importing CSV files, processing form exports, tidying old database records, preparing data for an API, or fixing inconsistent values before a migration.

This mini-project keeps the input inside the script so you can focus on the language work. The important skills are trimming strings, normalising case, validating required values, looping through rows, returning clean arrays, and reporting what changed.

Start with messy rows

Raw data is rarely perfect. A customer export might contain extra spaces, mixed casing, empty names, and email addresses with capital letters.

PHP example
<?php

declare(strict_types=1);

$rows = [
    ['name' => '  ada lovelace ', 'email' => ' ADA@EXAMPLE.COM '],
    ['name' => 'grace HOPPER', 'email' => 'grace@example.com'],
    ['name' => '   ', 'email' => 'missing@example.com'],
];

Each row is an associative array. The script should clean the good rows and skip the row that does not have a usable name.

Clean one value first

Do not start by writing a large loop. Write a small function that cleans one piece of data.

PHP example
<?php

declare(strict_types=1);

function cleanName(string $name): string
{
    $name = trim($name);
    $name = strtolower($name);

    return ucwords($name);
}

echo cleanName('  ada lovelace ');

// Prints:
// Ada Lovelace

trim() removes whitespace from the beginning and end. strtolower() gives you a predictable base. ucwords() then capitalises each word.

Clean email addresses differently

Not all fields use the same rule. Names might be title-cased, but email addresses are usually lower-cased and trimmed.

PHP example
<?php

declare(strict_types=1);

function cleanEmail(string $email): string
{
    return strtolower(trim($email));
}

echo cleanEmail(' ADA@EXAMPLE.COM ');

// Prints:
// ada@example.com

Small named functions make these rules easy to review. A future developer can change name cleaning without accidentally changing email cleaning.

Decide what counts as valid

Cleaning is not the same as accepting everything. If a required value becomes empty after trimming, the row should not be treated as valid.

PHP example
<?php

declare(strict_types=1);

function hasRequiredName(array $row): bool
{
    return trim($row['name'] ?? '') !== '';
}

var_dump(hasRequiredName(['name' => '   ']));

// Prints:
// bool(false)

The ?? operator protects the code when a key is missing. A missing name and a blank name both fail the check.

Return cleaned rows

The main cleaning function should accept a list of rows and return a new list. That keeps the function easy to use and avoids surprising changes to the original data.

PHP example
<?php

declare(strict_types=1);

function cleanName(string $name): string
{
    return ucwords(strtolower(trim($name)));
}

function cleanEmail(string $email): string
{
    return strtolower(trim($email));
}

function cleanCustomerRows(array $rows): array
{
    $cleaned = [];

    foreach ($rows as $row) {
        if (trim($row['name'] ?? '') === '') {
            continue;
        }

        $cleaned[] = [
            'name' => cleanName($row['name']),
            'email' => cleanEmail($row['email'] ?? ''),
        ];
    }

    return $cleaned;
}

$rows = [
    ['name' => '  ada lovelace ', 'email' => ' ADA@EXAMPLE.COM '],
    ['name' => '   ', 'email' => 'missing@example.com'],
];

print_r(cleanCustomerRows($rows));

// Prints:
// Array
// (
//     [0] => Array
//         (
//             [name] => Ada Lovelace
//             [email] => ada@example.com
//         )
// )

continue skips invalid rows. The script does not crash just because one row in the import is unusable.

Count what happened

A practical cleaning script should report what it did. Counts make the result easier to check.

PHP example
<?php

declare(strict_types=1);

$totalRows = 3;
$cleanedRows = 2;
$skippedRows = $totalRows - $cleanedRows;

echo 'Read: ' . $totalRows . PHP_EOL;
echo 'Cleaned: ' . $cleanedRows . PHP_EOL;
echo 'Skipped: ' . $skippedRows . PHP_EOL;

// Prints:
// Read: 3
// Cleaned: 2
// Skipped: 1

Counts are useful in command-line scripts because they make manual verification quick. If a file has 500 rows and the script says it cleaned 40, you know something is wrong.

Keep validation clear

For a beginner project, skipping invalid rows is acceptable if the script reports the skip. In more serious imports, you might collect row numbers and error messages.

PHP example
<?php

declare(strict_types=1);

function validateCustomerRow(array $row): ?string
{
    if (trim($row['name'] ?? '') === '') {
        return 'Name is required.';
    }

    if (trim($row['email'] ?? '') === '') {
        return 'Email is required.';
    }

    return null;
}

echo validateCustomerRow(['name' => '', 'email' => 'a@example.com']);

// Prints:
// Name is required.

Returning null for "no error" is a common simple pattern. Later, larger applications may use exceptions, validation objects, or framework validators.

What a good solution includes

A good data-cleaning script should have small functions for each cleaning rule, a loop that handles every row, clear validation for required values, and a final report showing how many rows were read, cleaned, and skipped.

Before moving on, make sure you can explain why trim(), case normalisation, missing-key checks, and skipped-row counts matter in a real import or maintenance script.

Practice

Task: Clean Customer Names

Write a small PHP script that cleans customer names by trimming whitespace and normalising the casing.

Requirements

  • Use declare(strict_types=1);.
  • Create a cleanName() function.
  • Trim leading and trailing whitespace.
  • Convert the name to lower case before capitalising words.
  • Print the cleaned names.

Use this data:

PHP example
<?php

declare(strict_types=1);

$names = [
    '  ada lovelace ',
    'grace HOPPER',
    '  alan TURING',
];

// Your output should be:
// Ada Lovelace
// Grace Hopper
// Alan Turing
Show solution
PHP example
<?php

declare(strict_types=1);

function cleanName(string $name): string
{
    return ucwords(strtolower(trim($name)));
}

$names = [
    '  ada lovelace ',
    'grace HOPPER',
    '  alan TURING',
];

foreach ($names as $name) {
    echo cleanName($name) . PHP_EOL;
}

// Prints:
// Ada Lovelace
// Grace Hopper
// Alan Turing

The function performs the cleaning in a predictable order: trim first, normalise to lower case, then capitalise each word.

Task: Predict Cleaning Counts

Read the script and predict the output before running it.

PHP example
<?php

declare(strict_types=1);

function hasRequiredName(array $row): bool
{
    return trim($row['name'] ?? '') !== '';
}

$rows = [
    ['name' => '  Ada ', 'email' => 'ada@example.com'],
    ['name' => '   ', 'email' => 'blank@example.com'],
    ['email' => 'missing-name@example.com'],
    ['name' => 'Grace', 'email' => 'grace@example.com'],
];

$cleaned = 0;
$skipped = 0;

foreach ($rows as $row) {
    if (! hasRequiredName($row)) {
        $skipped++;
        continue;
    }

    $cleaned++;
}

echo 'Cleaned: ' . $cleaned . PHP_EOL;
echo 'Skipped: ' . $skipped . PHP_EOL;

// What does this print?

After predicting the output, explain why the row without a name key is skipped.

Show solution
PHP example
<?php

declare(strict_types=1);

function hasRequiredName(array $row): bool
{
    return trim($row['name'] ?? '') !== '';
}

$rows = [
    ['name' => '  Ada ', 'email' => 'ada@example.com'],
    ['name' => '   ', 'email' => 'blank@example.com'],
    ['email' => 'missing-name@example.com'],
    ['name' => 'Grace', 'email' => 'grace@example.com'],
];

$cleaned = 0;
$skipped = 0;

foreach ($rows as $row) {
    if (! hasRequiredName($row)) {
        $skipped++;
        continue;
    }

    $cleaned++;
}

echo 'Cleaned: ' . $cleaned . PHP_EOL;
echo 'Skipped: ' . $skipped . PHP_EOL;

// Prints:
// Cleaned: 2
// Skipped: 2

The row with spaces is skipped because trimming leaves an empty string. The row without a name key is also skipped because $row['name'] ?? '' falls back to an empty string.

Task: Return Cleaned Name List

Write a cleanCustomerRows() function that returns a new list of cleaned customer rows.

Requirements

  • Use declare(strict_types=1);.
  • Trim and title-case each valid customer name.
  • Trim and lower-case each email address.
  • Skip rows with an empty or missing name.
  • Return the cleaned rows instead of printing inside the cleaning function.
  • Print each cleaned customer after the function returns.

Use this data:

PHP example
<?php

declare(strict_types=1);

$rows = [
    ['name' => '  ada lovelace ', 'email' => ' ADA@EXAMPLE.COM '],
    ['name' => '   ', 'email' => 'blank@example.com'],
    ['name' => 'grace HOPPER', 'email' => 'GRACE@EXAMPLE.COM'],
];

// Your output should be:
// Ada Lovelace <ada@example.com>
// Grace Hopper <grace@example.com>
Show solution
PHP example
<?php

declare(strict_types=1);

function cleanName(string $name): string
{
    return ucwords(strtolower(trim($name)));
}

function cleanEmail(string $email): string
{
    return strtolower(trim($email));
}

function cleanCustomerRows(array $rows): array
{
    $cleaned = [];

    foreach ($rows as $row) {
        if (trim($row['name'] ?? '') === '') {
            continue;
        }

        $cleaned[] = [
            'name' => cleanName($row['name']),
            'email' => cleanEmail($row['email'] ?? ''),
        ];
    }

    return $cleaned;
}

$rows = [
    ['name' => '  ada lovelace ', 'email' => ' ADA@EXAMPLE.COM '],
    ['name' => '   ', 'email' => 'blank@example.com'],
    ['name' => 'grace HOPPER', 'email' => 'GRACE@EXAMPLE.COM'],
];

$customers = cleanCustomerRows($rows);

foreach ($customers as $customer) {
    echo $customer['name'] . ' <' . $customer['email'] . '>' . PHP_EOL;
}

// Prints:
// Ada Lovelace <ada@example.com>
// Grace Hopper <grace@example.com>

cleanCustomerRows() returns data instead of printing it. That makes the function reusable in a CLI script, an import job, or a later test.