-2

I need to check if a given string is valid. For me, the following and any other similar combinations are all valid URLs

'https://example.com/api/',
'https://www.example.com/test-subpath',
'https://www.example.com',
'example.com/test/page',
'www.example.com',
'www.subdomain.example.com',
'https://www.subdomain.example.com',
'subdomain.example.com',
'http://subdomain.example.com',
'https://subdomain.example.com'

while

'user-service/api/'

is invalid. I tried parse_url() and filter_var($url, FILTER_VALIDATE_URL) methods but non worked.

Thanks in advance.

9
  • 2
    Works as expected 3v4l.org/LpLl2 www.example.com is no URL, because it is a hostname. Commented Jul 6 at 8:44
  • 4
    Why regex? If the URL fails, add a protocol and check again. 3v4l.org/oX2as Commented Jul 6 at 9:34
  • 2
    You should run with @MarkusZeller's suggestion. Creating regex for something like this isn't as simple as you might first think. There are many variants and you need to know not only what characters are valid but also where in the URL they are valid. People have lost their minds for less. Commented Jul 6 at 9:43
  • 1
    You have not described the rules of what to match and what not to match. All you have provided are some examples. For those examples, just searching for a dot will give you the correct answer. Commented Jul 6 at 9:53
  • 1
    Please don't invent your own definition of a "valid" URL. First, start with processing the string into a UriInterface (see php-fig.org/psr/psr-7). Then, if you want to impose additional restrictions on the resulting valid URL, make those explicitly. Commented Jul 7 at 7:59

2 Answers 2

1

Your script and the filter method works as expected. As www.example.com is just a hostname and only a part of a URL, it needs to return false.

The definition of a URL sais

<scheme>:<scheme-specific-part>

In your case, you may apply following ruleset

(Demo: https://3v4l.org/oX2as)

  • Check if the URL is valid
    • yes -> return true
    • no -> add the scheme and check again
      • yes -> return true
      • no -> validation failed

As already stated in the comments, it is no good idea to use a regex, which may come over complicated to check for all rules, or write custom functions, i.e. with missing check for charsets.

Use whenever possible well known and tested supplied PHP functions.

$urls = [
    'https://example.com/api/',
    'https://www.example.com/test-subpath',
    'https://www.example.com',
    'example.com/test/page',
    'www.example.com',
    'www.subdomain.example.com',
    'https://www.subdomain.example.com',
    'subdomain.example.com',
    'http://subdomain.example.com',
    'https://subdomain.example.com',
];

function checkUrl(string $url): bool
{
    return (bool)filter_var($url, FILTER_VALIDATE_URL);
}

foreach ($urls as $url) {
    if (checkUrl($url)) {
        echo $url . ' is a valid URL', PHP_EOL;
        continue;
    }

    $urlWithProtocol = 'https://' . $url;
    if (checkUrl($urlWithProtocol)) {
        echo $urlWithProtocol, ' with added protocol is a valid URL', PHP_EOL;
        continue;
    }

    echo $url, ' is not a valid URL', PHP_EOL;
}

will result in

https://example.com/api/ is a valid URL
https://www.example.com/test-subpath is a valid URL
https://www.example.com is a valid URL
https://example.com/test/page with added protocol is a valid URL
https://www.example.com with added protocol is a valid URL
https://www.subdomain.example.com with added protocol is a valid URL
https://www.subdomain.example.com is a valid URL
https://subdomain.example.com with added protocol is a valid URL
http://subdomain.example.com is a valid URL
https://subdomain.example.com is a valid URL
0

This is so simple example to validate your URL patterns.

<?php
$re = '#example[.]com#i';
$str = 'subdomain.example.com';

preg_match($re, $str, $matches, PREG_OFFSET_CAPTURE, 0);

// Print the entire match result
var_dump($matches);
?>

Hope this helps for you. Thank you.

Not the answer you're looking for? Browse other questions tagged or ask your own question.