Logo Banner

SOFTWARE - [Reg-Ex Group Extractor]

Reg-Ex Group Extractor

What is Reg-Ex Group Extractor?

Group Extractor is a small program that allows you to extract data from text strings using a technique called 'Regular Expressions Groups'. The program can be a valuable tool to do some text manipulations. The best way to explain the purpose of the tool is by giving an example: Let's assume you have a list with the filenames of photos, where you want to separate the filenames and extensions (JPG/RAW):

    _DSC3091.jpg
    _DSC3094.raw
    _DSC3095.jpg
    _DSC3097.raw
    extensionLessSample
    

This can be done by writing a small regular expression that defines two groups (variables) called 'filename' and 'extension': ^(?<filename>.+)\.(?<extension>.+)$. The program will verify for each line if it matches the global Regular Expression, and then split up the data into groups. If a line doesn't match the global expression, it will be skipped (indicated in red). So the tool will split up each of the lines of the source text and put the results in a table. The columns are named after the groups (variables) you defined in your Regular Expression.

Output Options

The program offers some post-processing options to trim the data fields (remove leading/trailing spaces) or to exclude the empty lines. You can also save the generated list as CSV data, or rebuild a certain text string (see below). For my own convenience I have added a button that can generate VB.NET source code to implement the written Regular Expression very quickly.

String Builder

The String builder allows you to combine the obtained variables into a completely new customizable string. A very basic example:

    _DSC3091 is a jpg
    _DSC3094 is a raw
    _DSC3095 is a jpg
    _DSC3097 is a raw
    

More about Regular Expressions

Before you can use this program you will need to have some basic-knowledge about Regular Expressions. You can find whole books about Regular Expressions so I am not going to explain them in details.

Basically, the program will split up the input data in separate lines. Then it will examine each line and check if it matches the specified Regular Expression. If it does the groups/variables are extracted and added into a table.

You can create a group by placing it between round brackets (parentheses). The opening bracket is always followed by a question-mark. Then the name of the group is specified between the < and >, followed by the Regular Expression to describe the group:(?<name>some_reg_ex).

Regular Expressions have a lot of reserved characters like the decimal point, which means 'any character'. In case you really want to use a decimal point, you need to escape it. This can be done by precede it by the backslash-character.

Sample Regular Expressions

Here you can find some sample applications where the data can be split using Regular Expressions. I kept the expressions as simple as possible, but in a real-world application precautions must be taken to make sure the expression is not too "greedy".

Download

Reg-Ex Group Extractor is written in Visual Basic .NET 4.0. You can download the complete setup wizard by clicking the button below.

Copyright ©1998-2022 Vanderhaegen Bart - last modified: August 11, 2017