36
www.loctimize.com Loctimize GmbH Localizing dynamic websites created from open source content management systems memoQfest 2012, May 10, 2012, Budapest Daniel Zielinski Martin Beuster [daniel|martin]@loctimize.com

20120510 WebL10N MemoQfest Budapest

Embed Size (px)

Citation preview

Page 1: 20120510 WebL10N MemoQfest Budapest

www.loctimize.com

Loctimize GmbH

Localizing dynamic websites

created from open source

content management systems memoQfest 2012, May 10, 2012, Budapest

Daniel Zielinski

Martin Beuster

[daniel|martin]@loctimize.com

Page 2: 20120510 WebL10N MemoQfest Budapest

2

Agenda

• Open source content management systems

• The localization challenges

• General localization strategies

• Conclusions

© 2012 Loctimize – All rights reserved

Page 4: 20120510 WebL10N MemoQfest Budapest

4

Challenges

© 2012 Loctimize – All rights reserved

Extract

content

Prepare

content

Translate

content

Integrate

translated content

Test localization

Fix bugs

Publish localized website

Identify content

Create /

update

content

Page 5: 20120510 WebL10N MemoQfest Budapest

5

Identify content - Database

• Most of the content is stored in databases

• Databases are made up of related tables

• The tables are made up of rows and columns

• The fields contain the content in different formats

(Text, HTML, XML, proprietary format) and

• Metadata used for identifying/filtering the

relevant content

translate = 0

language = 2

published = 1

deleted = 1

Page 6: 20120510 WebL10N MemoQfest Budapest

6

Identify content - Database

© 2012 Loctimize – All rights reserved

HTML

content

Text

content ?

Page 7: 20120510 WebL10N MemoQfest Budapest

7

Identify content - File system

© 2012 Loctimize – All rights reserved

• Template files (HTML, CSS, JPG, PNG, GIF)

• Configuration files (INI, PHP, PROPERTIES,

TXT…)

• Localization files (XLIFF, XML, PHP…)

• User files (PDF, DOC, XLS, PPT,…)

Page 8: 20120510 WebL10N MemoQfest Budapest

8

Identify content - Template files (HTML)

© 2012 Loctimize – All rights reserved

Translatable

content?

Page 9: 20120510 WebL10N MemoQfest Budapest

9

Identify content - Configuration files – INI Files

• Some of the content is stored in INI files.

• It is stored in key-value pairs.

Keys = Values

Page 10: 20120510 WebL10N MemoQfest Budapest

10

Identify content - Configuration files –

PHP Files

• Some of the content is stored in PHP files

• It is stored in key-value pairs or arrays

Page 11: 20120510 WebL10N MemoQfest Budapest

11

Identify content - Localization files - XML

© 2012 Loctimize – All rights reserved

UI strings

IDs

Language

groups

Page 12: 20120510 WebL10N MemoQfest Budapest

Extract content – Database

• Manually by copying

• Available extensions

– that understand the I18N/L10N logic of the CMS

– that extract and export into a translatable

exchange format

• Develop scripts and exchange formats

– to extract and export the content into a

translatable exchange format

Page 13: 20120510 WebL10N MemoQfest Budapest

Extract content – Database

• Joomla! Joom!Fish Plus, Jolomea (XML, XLIFF,

PO)

• TYPO3 Localization Manager (XML)

• Drupal i18n, Translation Management, (XML,

XLIFF)

• Wordpress Easy Translator Pro (PO)

• Wordpress WPML (XLIFF)

© 2012 Loctimize – All rights reserved

Page 14: 20120510 WebL10N MemoQfest Budapest

14

Extract content – Database

© 2012 Loctimize – All rights reserved

Meta data

Page content

Source URL

Page 15: 20120510 WebL10N MemoQfest Budapest

15

Extract content – Database

IDs

Page 16: 20120510 WebL10N MemoQfest Budapest

Extract content – Files

• Copy files

• Know the file structure of the CMS

• FTP access

• Access to CMS backend with appropriate rights

© 2012 Loctimize – All rights reserved

Page 17: 20120510 WebL10N MemoQfest Budapest

17

Automate workflow?

© 2012 Loctimize – All rights reserved

• Use content connector and/or API to pass on the

localisable content to memoQ.

Page 18: 20120510 WebL10N MemoQfest Budapest

Prepare files

• Defining non-translatable content

– Add additional tags

– Defining filter settings

• XML filter

• HTML filter

• RegEx text filter

• Cascading filters

• RegEx tagger

• Joining files

© 2012 Loctimize – All rights reserved

Page 19: 20120510 WebL10N MemoQfest Budapest

Translate content

• Lack of context

– Translation of content deltas (updates)

– Translation without visual information (XML, INI)

• Placeholders like %1, $2, {1}, $VAR, \n, \t

© 2012 Loctimize – All rights reserved

Page 20: 20120510 WebL10N MemoQfest Budapest

20

Translate content - HTML

• HTML files are added to memoQ using the

standard filter.

• Tags and attributes can not be configured

(localized hyperlinks).

• A preview is available to translators and revisers.

Page 21: 20120510 WebL10N MemoQfest Budapest

21

Translate content - HTML

© 2012 Loctimize – All rights reserved Preview

Lookup results

Editor

Page 22: 20120510 WebL10N MemoQfest Budapest

Translate content – XML files

• Add the XML files to memoQ using a pre-defined

XML filter (and a cascading HTML/RegEx text

filter).

• Content is grouped by page

• Source URL in comments field

© 2012 Loctimize – All rights reserved

Page 23: 20120510 WebL10N MemoQfest Budapest

23

Translate content - XML

© 2012 Loctimize – All rights reserved Preview

Lookup results

Editor

Source URL

Page 24: 20120510 WebL10N MemoQfest Budapest

24

Translate content – INI files

• Add the INI files to memoQ using a Regex text

filter and a cascading HTML filter.

• The Regex text filter defines paragraphs as

([^=]*=)(.+) with content group 2.

Page 25: 20120510 WebL10N MemoQfest Budapest

25

Post-processing translated content –

Convert to SQL

• Using a script the HTML files are converted to

SQL files.

• The IDs extracted from the tags in the HTML are

used to update the correct rows.

Page 26: 20120510 WebL10N MemoQfest Budapest

Integrate localised content

• Manually by copying & pasting

• Available extensions

– that understand the I18N/L10N logic of the CMS

– that import the localized content

• Develop scripts

– to import the localized content

© 2012 Loctimize – All rights reserved

Page 27: 20120510 WebL10N MemoQfest Budapest

27

Importing localised content - Database

© 2012 Loctimize – All rights reserved

• Preview links with login information

• Overwrite mode

Page 28: 20120510 WebL10N MemoQfest Budapest

Importing localised content - Database

• The translated SQL file is imported into the

database.

• The table rows are updated with the translated

content along with other settings.

– Original text

– Modified date

– Published flag

– Hashed value

Page 29: 20120510 WebL10N MemoQfest Budapest

29

Importing localised content – INI files

• Translated INI files are exported from memoQ

• These are stored in the appropriate folders on

the web server

Page 30: 20120510 WebL10N MemoQfest Budapest

Automate workflow?

• Watch export folders and use CMS API/script to

import localized files

© 2012 Loctimize – All rights reserved

Page 31: 20120510 WebL10N MemoQfest Budapest

31

Test localised content

• Find the localised content in the website

(frontend)

• Proof-read

• Layout check

© 2012 Loctimize – All rights reserved

Page 32: 20120510 WebL10N MemoQfest Budapest

32

Fix localization bugs

• Find where content to be updated came from

• Update content in CMS

• Update bilingual files and/or translation memory

• Modify stylesheets (CSS)

• …

© 2012 Loctimize – All rights reserved

Page 33: 20120510 WebL10N MemoQfest Budapest

33

Publish localized page

© 2012 Loctimize – All rights reserved

Update!

Page 34: 20120510 WebL10N MemoQfest Budapest

34

Conclusion

• Complex processes

• Interaction of a lot of people

• No standard procedures

• Need to develop processes and tools

• Risk of loosing/missing data when trying to

mimic CMS core functionality

© 2012 Loctimize – All rights reserved

Page 35: 20120510 WebL10N MemoQfest Budapest

35

Conclusion

Translation Service Provider

• Time consuming

• Scoping is a non trivial

first step

• Expertise in CMS, web

technology, databases

• Develop tools

• Educate client and web

developers

• Sponsor development!

Client

• Choose CMS wisely

• I18N & L10N strategy

• Expect additional costs

for localisation

engineering and/or

development

• Time consuming!

© 2012 Loctimize – All rights reserved

Page 36: 20120510 WebL10N MemoQfest Budapest

Thank you very

much for your

attention!