22
Stakeholders in memoQ Server Projects A Quick Overview

Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

Stakeholders in memoQ Server Projects

A Quick Overview

Page 2: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

Regular Expression

[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}

Matching Text

202ca4c2-749d-4f54-ae02-fdf19939ef10

The Scary Bit

Page 3: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid
Page 4: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

What Are Regular Expressions?

• They are not a programming language

• Symbols that describe a text pattern

• Used to match, search and manipulate text

• A more powerful “Search and replace”

• Called “regex” for short

• There are several regex engines or “flavours”

• memoQ uses Microsoft .NET

Page 5: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

How Long Does It Take to Learn a New Language?

*http://www.effectivelanguagelearning.com/language-guide/language-difficulty

Page 6: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

How Long Does It Take to Learn Regex?

You can start creating your own basic expressions within a few minutes.

Page 7: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

SIGH OF RELIEF

Page 8: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

What Are They Used For?

• Search and match: – Email addresses

– Urls

– Tags and placeholders

– Phone number formats

– Alternate spellings

– Consistency checks (e.g. lower case v. upper case)

– Trailing spaces

– Punctuation sequences (for segmentation)

– Other repetitive/sequential text

Page 9: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

Where in memoQ?

Page 10: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

Two Types of Regex Text

Literal characters

bomb

bomb

bomber

A-bomb

The bomb went off.

Bombs off.

b o m b

Metacharacters

\

.

*

?

+

[]

-

|

()

{}

$

^

Page 11: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

Metacharacters

. Any character

* Preceding item zero or more times

? Preceding item zero or one time

+ Preceding item one or more times

[ Begin character set

] End character set

- Separator in ranges

| Either or

{} Bean counting

^ Start of segment // Negate a character set

$ End of segment

( Begin group

) End group

Page 12: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

Character Sets

Will match any one of the characters in the set but only once, unless otherwise specified by bean counting {}

[a-z] Lower case [A-Z] Upper case [a-Z] Any case [0-9] Digits [0-9A-z] Digits + letters \p{Ll} Lower + special letters \p{Lu} Upper + special letters \p{L} Any case + special letters

Can be negated using ^ [^0-9] Any character except a digit

Can be combined [0-9a-e ,]

Page 13: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

Shorthand Character Sets

\d Digit \w Digit OR letter \s Whitespace \b Boundary (Beginning OR end of word) \t Tab \r Line return \n New line \D Not a digit \W Not a digit OR a letter \S Not a whitespace \tag memoQ tag

Page 14: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

“Escaping” Metacharacters

If you need to match a special character in the text, you will have to “escape” it, or mark it for its literal meaning.

This is achieved by putting a backslash in front of it.

\(

\)

\{

\}

\$

\^

\!

\\

\.

\?

\*

\+

\[

\]

\-

\|

Page 15: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

Find and Replace

Replace expressions allow you choose which parts of the text to replace and which parts to keep as they are. This is achieved via groups ()

Search: (\d{1,})\s{1,}[mM][gG]

Replace: $1 mg

Finds: 225 mG

Replaces with: 225 mg

Page 16: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

Greedy v. Lazy

Page 17: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid matching more text than you intend to. Use the non-greedy marker ? after * and +. Example:

pur.*\b will match “All purées contains at least 10% of the main ingredient, unless otherwise specified in the purée description.”

pur.*?\b will match “All purées contains at least 10% of the main ingredient, unless otherwise specified in the purée description.”

Page 18: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

Auto-Translation: Practical Cases

• Email addresses

\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*

• URLS

(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?

• Phone numbers

\d{5}\s\d{6} 01908 443300

\d{5}-\d{6} 01908-443300

\+\d{2}\s\(0\)\s\d{4}\s\d{6} +44 (0) 1908 443300

• Duplicate word pairs*

(\b\w+ \w+\b) \b\1\b

*Published by Max B. on the Yahoo mQ group

Page 19: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

Segmentation: Practical Case

SOURCE: “Manufactured in China (PRC) for the UK market. Ingredients: Lemon Grass Purée (15%), Red Chilli Purée (11%), Onion, Water, Coconut Milk, Red Pepper, Galangal (5%), Sugar (Sulphites), Lime Juice From Concentrate (Sulphites), Salt, Rapeseed Oil, Garlic Purée, Rice Wine Vinegar (Sulphites), Lime Leaves (2.5%), Yeast Extract, Chilli Flakes, Cornflour, Tamarind Paste, Coriander, Cayenne Pepper, Paprika Extract.”

SOLUTION: Split segment before opening bracket if ending bracket is followed by a comma, a space and an upper case letter

[\s]+#!#\([\s]*[\p{L}0-9]*\.?\d*\s*%?\),\s+\p{Lu}

Page 20: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

Regex Tagger: Practical Case

SOURCE: “Dear [%$FIRSTNAME%] [%$LASTNAME%], Your online order placed on [%$WEBSITE%] on [%$DATE%] and processed as the authorized vendor of [%$RANGE%] products, has been successfully completed (order number: [%$REFNO%]). Please note that [%if $ORDER != ""%][%$ORDER%][%else%] [%$COMPANY%] will appear on your bank statement, instead of [%$RANGE%].”

SOLUTION: Create a cascading filter (Plain text + Regex tagger) and add the below to tagger.

\[%.*?%\] OR, if you want to be more strict

\[%[a-z]+%\] \[%\$[A-Z]+%\] \[%if .*\!\=.*%\]

Page 21: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

Resources

• Regex 101

https://regex101.com/

• Regex Pal

http://www.regexpal.com/

• Using regular expressions in memoQ (Basic level), by Miklós Urbán

https://www.memoq.com/recorded-webinars

• “Do the magic: Regular Expressions in FrameMaker”, by Marek Pawelec

https://blogs.adobe.com/techcomm/2016/03/framemaker-regular-expressions.html

• memoQ Yahoo Group

https://groups.yahoo.com/neo/groups/

• Regex Hero

http://regexhero.net/reference/

• Regex Cheat Sheet

https://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

Page 22: Stakeholders in memoQ Server Projects...Dangers of Greediness By default, regex expressions are greedy, so it is a good habit to limit your expressions as much as possible to avoid

Queries and Feedback

Please send any comments, questions or feedback to:

[email protected]