Our website would like to use cookies to store information on your computer. You may delete and block all cookies from this site, but parts of the site will not work as a result. Find out more about how we use cookies.

Login or Register

Powered by
Powered by Novacaster
 
regex assistance
by Graham Freeman at 18:35 08/06/07 (Forum::Technical Advice::General)
Ages since I've used regex for more than single character matching ...
Hi all,

I have a really long string called $html which contains a whole web page from which I want to grab some info.

The format is sort of like this:

================
blahblahblah
info-1
info-2
info-3
tum-te-tum-te-tum
info-4
info-5
foobarfoobar
================

I want 2 arrays - one array that contains the info-n between blahblah and tum-te-tum, and another array that contains the info-n between tum-te-tum and foobar.

Can a master of regex show me how this is done please?!

It's PHP but that shouldn't matter much.

Thanks!!
--
gfreeman

<< scp, sftp, etc... Anyone know of a solution to >>
View Comments (Threaded Mode) Printer Version
regex assistance Graham Freeman - 18:35 08/06/07
Re: regex assistance Graham Freeman - 18:39 08/06/07
eep

that cut out my tags ...

blahblahblah
<tag>info-1</tag>
<tag>info-2</tag>
<tag>info-3</tag>
tum-te-tum-te-tum
<tag>info-4</tag>
<tag>info-5</tag>
foobarfoobar
--
gfreeman

Re: regex assistance Hugo van der Sanden - 13:32 10/06/07
The trouble with examples is that there are so many ways to read them.

It could be: get the chunk of text inside each <tag>...</tag>. Split the chunks into two lists, those before and those after the literal text 'tum-te-tum-te-tum'.

Or it could be: get the (exactly) 5 chunks of text in <tag>...</tag>, and put 3 in one array and 2 in another.

Or many other things, perhaps involving 'blah' and 'foo' and 'bar'.

I don't know PHP, but in perl I probably wouldn't try to extract the info for the first of those definitions in one pattern, but something like:

my($head, $tail) = split $text, m{tum-te-tum-te-tum}, 2;
my @array1 = ($head =~ m{<tag>(.*?)</tag>}g);
my @array2 = ($tail =~ m{<tag>(.*?)</tag>}g);

In general though, coming up with a clear definition of what you need to match is about 90% of the work in writing the code. The other 10% is effectively a translation exercise.

Hugo

Re: regex assistance Bruce Ure - 18:40 08/06/07
I think you'll find this should do the trick:

$[\(':\[1..]^\\$_!"}](^~)]]_\$%

--

Re: regex assistance Graham Freeman - 18:42 08/06/07
I tried that, now my ISP is sending me a bill for a replacement server.
--
gfreeman