Extracting Text¶
The Text tab is where users may define line numbers, equipment tags, and other text data they wish to embed as additional Symbol attribute data. This would be applicable to text data that cannot otherwise be automatically extracted using the Find Attributes tool.
The following guides explain how to extract text in your projects.
Extracting Text¶
There are two ways to extract text based on the scenario:
Here's how to extract multiple text blocks using the Search feature. Use this approach where there are numerous instances of similar text.
- Open a project.
- Open the Text tab.
- Open a diagram and locate the text you want to extract.
-
Click on the toolbar.
The Search window displays.
-
Under Category Selection, select the type of text you are extracting.
Important
If you are extracting line numbers and want to associate the line number data with symbols, you must select Line Number as the category.
-
In Pattern, enter a regex pattern to help identify similar text.
Line Number Regex
The RegEx logic for Line Numbers will omit whitespace data from the detected strings. Users are discouraged from using
\s
(whitespace wildcard) from such searches as it will return unexpected results. -
Click SEARCH ALL PAGES or SEARCH CURRENT PAGE.
The Review window displays with the first match. The text is also highlighted on the diagram for reference.
-
Review the matches:
Info
- To approve a match, ensure that the correct Id is extracted and adjust if necessary, then click APPROVE.
- To approve all matches, click next to APPROVE, select Approve All, then click APPROVE ALL.
- To reject a match, click REJECT.
- To end the review at any time, click STOP REVIEW.
Tips
- Once you approve or reject a match, the next match displays for review. Continue until you review the full list of matches.
- As you approve matches, they get added to the list on the sidebar.
- Successful searches can be saved to Templates to be applied during project Processing.
Here's how to extract a single text block. Use this approach to extract text that is unique.
- Open the project.
- Open the Text tab.
- Open a diagram and locate the text you want to extract.
- Click on the toolbar.
-
Zoom in as far as you can and draw a tight box around the text.
Tip
Selecting more area than necessary affects matching accuracy.
The Text Category window displays.
-
Select the type of text you are extracting.
Important
If you are extracting line numbers and want to associate those lines with symbols, you must select Line Number as the category.
-
Click NEXT.
The Review window displays with the extracted text.
-
If DataSeer was unable to extract the text, enter the missing text in the field.
-
Click APPROVE.
The text is added to the specified category.
Managing Text¶
Editing Text¶
Here's how to edit text after you extracted it:
- Open the project.
- Open the Text tab.
- Open the diagram where the text is located.
-
Select the text category on the sidebar.
The Text Editor displays with a list of extracted text.
-
Locate the text in the list.
- Edit the value.
- Press Enter
Reassigning the Text Type¶
You can reassign the type for specific text blocks. This lets you make adjustments to projects where text blocks were classified incorrectly.
Here's how to reassign text blocks to a new type:
- Open the project.
- Open the Text tab.
- Browse to the page where the text is located.
-
Select the text type on the sidebar.
The Text Editor displays with a list of extracted text.
-
Select the checkbox next to the text to reassign.
- Click at the top of the Editor.
- Click CONFIRM.
Deleting Text¶
Here's how to delete text that you extracted by mistake:
- Open the project.
- Open the Text tab.
- Open the diagram where the text is located.
-
Select the text type on the sidebar.
The Text Editor displays with a list of extracted text.
-
Select the checkbox next to the symbol.
- Click at the top of the Editor.
- Click CONFIRM.
Tips¶
Searching with Regex¶
Common Regex Characters
These are common regex characters that you can use to provide targeted yet flexible inputs to your text searches.
Regex Character | Meaning | Example | Interpretation | Potential Matches |
---|---|---|---|---|
^ | Begins with | ^SH | Matches strings that begin with "SH" | SH9821 , SHIFT |
$ | Ends with | 02$ | Matches strings that end with "02" | A8202 , 99902 |
\d | Any digit | \d | Matches any digit | 9 , 6 |
[A-Za-z] | Any letter | [A-Za-z] | Matches any letter | T , v |
\w | Any word character | w | Matches any word character | 5 , B , y |
\s | Any whitespace character | \s | Matches any whitespace character | whitespace |
. | Any character | . | Matches any character | - , 9 , r |
Common Regex Pattern Tokens
These pattern tokens help you indicate how many times a specific character should appear within a string.
Regex Token | Meaning | Example | Interpretation | Potential Matches |
---|---|---|---|---|
? | The preceding character appears zero or one times | \d? | Matches strings where a digit appears zero or one times | REST , Nine4 |
* | The preceding character appears zero or more times | [a-z]* | Matches strings where a lower case letter appears zero or multiple times | 9981 , 97w1 |
+ | The preceding character appears one or more times | [A-Z]+ | Matches strings where an upper case letter appears one or more times | A , RT |
{3} | The preceding character appears exactly three times | E{3} | Matches strings with three consecutive "E" characters | 67EEE , THREEE |
{3,} | The preceding character appears three or more times | t{2,} | Matches strings with two or more consecutive "t" characters | 88tt |
{3,6} | The preceding character appears between three and six times | o{1,3} | Matches strings with one or up to three consective "o" characters | those , 77ooo |
Example: Line Numbers
To extract line numbers like the following:
1/2"-MCH-DL74-4201
You can use these regex patterns:
Regex Pattern | Description |
---|---|
MCH | Matches text containing MCH |
."-. | Matches text containing x"- |
^1/2 | Matches text starting with 1/2 |
4201$ | Matches text ending with 4201 |
^1.*01$ | Matches text starting with 1 and ending with 01 |
...-...-....-.... | Matches text in xxx-xxx-xxxx-xxxx format |
.+-.+-.+-.+ | Matches text containing three dashes - |
To learn more about how to use regex, check out the regular expressions 101 site.
When you are reviewing text located during a search, you can use the following keyboard shortcuts on the Review window:
Shortcut | Action |
---|---|
A | Approve a match |
Shift + A | Approve all matches |
R | Reject a match |
Esc | Stop the review |
Putting it all together, if I wanted to target the string patterns 23-BV-122-1
, 99-RE-121-0
, 43-IT-127-8
I can use the capture pattern: \d{2}-[A-Z]{2}-\d{3}-\d