Skip to content

Extracting Text

clock

 

The Text tab is where users may define line numbers, equipment tags, and other text data they wish to embed as additional Symbol attribute data. This would be applicable to text data that cannot otherwise be automatically extracted using the Find Attributes tool.

Text review on the Text tab

The following guides explain how to extract text in your projects.


Extracting Text

There are two ways to extract text based on the scenario:

Here's how to extract multiple text blocks using the Search feature. Use this approach where there are numerous instances of similar text.

  1. Open a project.
  2. Open the Text tab.
  3. Open a diagram and locate the text you want to extract.
  4. Click on the toolbar.

    The Search window displays.

  5. Under Category Selection, select the type of text you are extracting.

    Important

    If you are extracting line numbers and want to associate the line number data with symbols, you must select Line Number as the category.

  6. In Pattern, enter a regex pattern to help identify similar text.

    Line Number Regex

    The RegEx logic for Line Numbers will omit whitespace data from the detected strings. Users are discouraged from using \s (whitespace wildcard) from such searches as it will return unexpected results.

  7. Click SEARCH ALL PAGES or SEARCH CURRENT PAGE.

    The Review window displays with the first match. The text is also highlighted on the diagram for reference.

    Text review

  8. Review the matches:

    Info

    • To approve a match, ensure that the correct Id is extracted and adjust if necessary, then click APPROVE.
    • To approve all matches, click next to APPROVE, select Approve All, then click APPROVE ALL.
    • To reject a match, click REJECT.
    • To end the review at any time, click STOP REVIEW.

    Tips

    • Once you approve or reject a match, the next match displays for review. Continue until you review the full list of matches.
    • As you approve matches, they get added to the list on the sidebar.
    • Successful searches can be saved to Templates to be applied during project Processing.

Here's how to extract a single text block. Use this approach to extract text that is unique.

  1. Open the project.
  2. Open the Text tab.
  3. Open a diagram and locate the text you want to extract.
  4. Click Label button on the toolbar.
  5. Zoom in as far as you can and draw a tight box around the text.

    Annotating text example

    Tip

    Selecting more area than necessary affects matching accuracy.

    The Text Category window displays.

  6. Select the type of text you are extracting.

    Important

    If you are extracting line numbers and want to associate those lines with symbols, you must select Line Number as the category.

  7. Click NEXT.

    The Review window displays with the extracted text.

    Line number review

  8. If DataSeer was unable to extract the text, enter the missing text in the field.

  9. Click APPROVE.

    The text is added to the specified category.


Managing Text

Editing Text

Here's how to edit text after you extracted it:

  1. Open the project.
  2. Open the Text tab.
  3. Open the diagram where the text is located.
  4. Select the text category on the sidebar.

    The Text Editor displays with a list of extracted text.

  5. Locate the text in the list.

  6. Edit the value.
  7. Press Enter

Reassigning the Text Type

You can reassign the type for specific text blocks. This lets you make adjustments to projects where text blocks were classified incorrectly.

Here's how to reassign text blocks to a new type:

  1. Open the project.
  2. Open the Text tab.
  3. Browse to the page where the text is located.
  4. Select the text type on the sidebar.

    The Text Editor displays with a list of extracted text.

  5. Select the checkbox next to the text to reassign.

  6. Click at the top of the Editor.
  7. Click CONFIRM.

Deleting Text

Here's how to delete text that you extracted by mistake:

  1. Open the project.
  2. Open the Text tab.
  3. Open the diagram where the text is located.
  4. Select the text type on the sidebar.

    The Text Editor displays with a list of extracted text.

  5. Select the checkbox next to the symbol.

  6. Click at the top of the Editor.
  7. Click CONFIRM.

Tips

Searching with Regex

Common Regex Characters

These are common regex characters that you can use to provide targeted yet flexible inputs to your text searches.

Regex Character Meaning Example Interpretation Potential Matches
^ Begins with ^SH Matches strings that begin with "SH" SH9821, SHIFT
$ Ends with 02$ Matches strings that end with "02" A8202, 99902
\d Any digit \d Matches any digit 9, 6
[A-Za-z] Any letter [A-Za-z] Matches any letter T, v
\w Any word character w Matches any word character 5, B, y
\s Any whitespace character \s Matches any whitespace character whitespace
. Any character . Matches any character -, 9, r

Common Regex Pattern Tokens

These pattern tokens help you indicate how many times a specific character should appear within a string.

Regex Token Meaning Example Interpretation Potential Matches
? The preceding character appears zero or one times \d? Matches strings where a digit appears zero or one times REST, Nine4
* The preceding character appears zero or more times [a-z]* Matches strings where a lower case letter appears zero or multiple times 9981, 97w1
+ The preceding character appears one or more times [A-Z]+ Matches strings where an upper case letter appears one or more times A, RT
{3} The preceding character appears exactly three times E{3} Matches strings with three consecutive "E" characters 67EEE, THREEE
{3,} The preceding character appears three or more times t{2,} Matches strings with two or more consecutive "t" characters 88tt
{3,6} The preceding character appears between three and six times o{1,3} Matches strings with one or up to three consective "o" characters those, 77ooo

Example: Line Numbers

To extract line numbers like the following:

1/2"-MCH-DL74-4201

You can use these regex patterns:

Regex Pattern Description
MCH Matches text containing MCH
."-. Matches text containing x"-
^1/2 Matches text starting with 1/2
4201$ Matches text ending with 4201
^1.*01$ Matches text starting with 1 and ending with 01
...-...-....-.... Matches text in xxx-xxx-xxxx-xxxx format
.+-.+-.+-.+ Matches text containing three dashes -

To learn more about how to use regex, check out the regular expressions 101 site.

When you are reviewing text located during a search, you can use the following keyboard shortcuts on the Review window:

Shortcut Action
A Approve a match
Shift + A Approve all matches
R Reject a match
Esc Stop the review

Putting it all together, if I wanted to target the string patterns 23-BV-122-1, 99-RE-121-0, 43-IT-127-8 I can use the capture pattern: \d{2}-[A-Z]{2}-\d{3}-\d


Last update: August 29, 2022
Back to top