Quantcast
Channel: David Eastman, Author at The New Stack
Viewing all articles
Browse latest Browse all 80

Using AI to Help Developers Work with Regular Expressions

$
0
0
subway

Regular Expressions (regex) in this age of AI is an interesting area, because surely AI can analyse text well enough to make the tool itself redundant. The only problem is that as developers, we want to extract answers for our work flows, not stay within a user-centred solution styled by the likes of OpenAI. We know that AI providers will want you to stay in their environment as they seek to fully monetize their work, and will be less helpful to those who want to use AI for only a part of their system.

Let’s look at a simple problem. I want to quickly answer a simple question about travel within London. I want to extract the start point and destination from this sentence: “Hi, what is the best way to travel from Walthamstow to Pimlico?”

Now this isn’t a tough question, and if you put it lock stock and barrel into an AI you will get an excellent reply, going far beyond the question if desired. You will, of course, receive no current information.

ChatGPT can very simply interrogate the sentence for meaning as well:

Of course, the answer is in text and this is a direct service to the user — it is not intended that you should intervene. It is important to separate the ability to understand text (which is fuzzy, even if it appears accurate) from the ability to exactly extract information an algorithm. They have different aims — the latter is brittle when used in real life. The former isn’t necessarily reliable as we can’t assume the answer is correct (the same is true if I casually ask a human for information).

What we actually want to do is simply capture the text between the words “from” and “to”. This is a simple regex solution:

/from ([A-Z][a-z]*) to ([A-Z][a-z]*)/


Using regex101.com we can confirm this solution:

We can see from the information collated below that the two capture groups have caught the starting point and destination as expected:

I can then go on to feed this into the rest of my system, whatever that is.

So assuming we want to find a regex similar to the one above, we can go to the various AI tools and see if they can help. But what are we looking for AI to offer us overall?

  • Our primary interest is to allow a user to generate the correct regular expression, and then go on to adapt it into their own code.
  • We can indeed use AI to understand the sentiment of the sentence, and confirm that we are looking at directions, via a starting point and a destination. We saw ChatGPT do this above.
  • The user can learn regular expressions quickly while working with code.
  • As a natural outcome of tool use, we can see other possibilities or possibly issues. Ideas come from seeing side effects.
  • We can also analyze the different ways these apps address the problems that they purport to solve.

The first site I looked at was regex.ai; obviously, the domain name gives me confidence that we have come to the right place.

This site has an interesting approach: I create a phrase, add the words that I want to extract, and the site produces various attempts at a regex. Let’s use our travel example:

As you can see, I have isolated the starting point and destination as required.

Below this, I press the run button wait a few seconds and get the following:

Now you can see that the four different agents have made four different attempts, with the regular expression at the top of the column and the result below. As you can see, only one has got a correct response (Agent B’s two capture groups are correct) but even the failures are educative.

Interestingly if I click on Agent B’s attempt, it gives me this dialogue:

This proves that the AI has indeed understood the sentiment of the sentence correctly, though it has not quite grasped that I have highlighted the starting point and destination. The other three agents have different grasps on the meaning, which are quite interesting in themselves.

How good was the correct attempt? Technically it was sharper than my simple solution, because it used the word boundary (the \b at the beginning) and the whitespace matcher (\s) instead of just a space. As there are no valid place names that are just one capital letter, using [A-Z][a-z]+ is more sensible than [A-Z][a-z]*, but this is minor.

Let’s do the same exercise with another site. RegExGPT looks promising, although unlike Regex.ai it does not use capture groups like the previous app. It uses another method:

This is quite an innovative mix — so instead of focusing on the capture, I can ask for what I want it to do in English, while indicating the area I am referring to. Not only that, it gives me a worked code solution. And it worked well:

First of all, it even named the function “extractRoute”, which is a neat side effect of understanding context.

It also provided a full explanation for the regular expression it used:

…and even an example of use within JavaScript:

While the regular expression itself is a little complex, the fact that it has fully explained its reasoning is very impressive. The bit that has somewhat complicated the solution is the non-capturing group indicated by (?: ). Because regular expressions use brackets to give priority to an expression and they are used to capture, the non-capturing group is used to do the first without the overhead of the second. The “from” and “to” do not need these, but the other use is interesting.

While the expression (\w+(?:\s\w+)*) looks complex, what it does is allow a multi-word expression to be caught, without getting the capture group wrong. Let us say that instead of Walthamstow, we had to travel from Green Park:

Note that the whitespace in “Green Park” has not caused an issue, which it would do in other solutions. Way to go.

While I am not sure either of these free-to-use tools has made regex necessarily any easier to use per se, both have done a very good job of exploring the problem area and solution. I am reminded of one of those film reviews that one can only fully appreciate after you see the film first. Not quite the intention, but interesting nevertheless. And we didn’t need to wait in an underground station for a Victoria line train to get to our destination.

The post Using AI to Help Developers Work with Regular Expressions appeared first on The New Stack.

David Eastman investigates AI tools that help developers generate a correct regular expression, and then adapt it into their own code.

Viewing all articles
Browse latest Browse all 80

Trending Articles