regex assistance
by Graham Freeman at 18:35 08/06/07 (Forum::Technical Advice::General)
Ages since I've used regex for more than single character matching ...
Hi all,

I have a really long string called $html which contains a whole web page from which I want to grab some info.

The format is sort of like this:


I want 2 arrays - one array that contains the info-n between blahblah and tum-te-tum, and another array that contains the info-n between tum-te-tum and foobar.

Can a master of regex show me how this is done please?!

It's PHP but that shouldn't matter much.


Re: regex assistance Graham Freeman - 18:39 08/06/07

that cut out my tags ...


Re: regex assistance Hugo van der Sanden - 13:32 10/06/07
The trouble with examples is that there are so many ways to read them.

It could be: get the chunk of text inside each <tag>...</tag>. Split the chunks into two lists, those before and those after the literal text 'tum-te-tum-te-tum'.

Or it could be: get the (exactly) 5 chunks of text in <tag>...</tag>, and put 3 in one array and 2 in another.

Or many other things, perhaps involving 'blah' and 'foo' and 'bar'.

I don't know PHP, but in perl I probably wouldn't try to extract the info for the first of those definitions in one pattern, but something like:

my($head, $tail) = split $text, m{tum-te-tum-te-tum}, 2;
my @array1 = ($head =~ m{<tag>(.*?)</tag>}g);
my @array2 = ($tail =~ m{<tag>(.*?)</tag>}g);

In general though, coming up with a clear definition of what you need to match is about 90% of the work in writing the code. The other 10% is effectively a translation exercise.


Re: regex assistance Bruce Ure - 18:40 08/06/07
I think you'll find this should do the trick:



Re: regex assistance Graham Freeman - 18:42 08/06/07
I tried that, now my ISP is sending me a bill for a replacement server.