Scale your Alfresco Solutions

1

Scale Your Alfresco SolutionsMike FarmanProduct Manager, Alfresco

Peter MonksDirector, Professional Services, Alfresco

Derek HulleySenior Engineer, Alfresco

3

Scale Your Alfresco Solutions

Many areas to consider...

• Core Repository • Web-tier load balancing and caching• Scale-up/scale out - horizontal vs. vertical• Components tuning• Replication strategies (3.4) • Profiling and benchmarking• ....

We’re going to focus on the Core Repository

4

Repository services

What happens when you create a node?

Write streamto disk

Begin Transaction

1

3

Update DB content URL

4Begin

Commit

5

UpdateIndex (Props & Content)

7

Index Fulltext(Background)

7a

Commit(Transaction ID for

IndexTracking)

8Create

node in DB

2

Transform (extract) Text

6

Content Indexing automatically moved to background if text extraction exceeds 20 ms

Add to L2 Cache

9

5

Repository Services

What happens when you querying for nodes?

Query (Lucene)

1

Results Set2

Batch Pre-fetch

3

In Cache4

DB Fetch4a

Result Set5

CheckPermissions

6 Deliver Results

7

- Max Permission Checks- Timeout

6

Repository Services

What happens when you read a nodes content?

Node Read Request

1

Cached2

DB Lookup

3

Fetch Content

4Stream

Response

5

7


Example Use Cases:

• UC01: Bulk Loading• High batch throughput, ongoing

• e.g. scanning, archival solutions, systems of record• Migration

• One-off migration to Alfresco from legacy system• Then UC02...

• UC02: Enterprise Collaboration Platform• Concurrent users, variety of interfaces• e.g. Team/Project Collaboration, Document/Knowledge

Management

9

UC01: Bulk Loading

Typical Characteristics

• Large number of documents and throughput• 10’s thousands documents injected per day, often during nightly hours• 10’s million documents per year

• Low User concurrency• 100-1000 users (read only access)

• Application profile – System of Record• End users mostly search & read• Document formats: PDF, TIFF, JPG (i.e. no full text indexing)• Typically fixed metadata• No or little version control• Few to no rules, actions, workflows, content transformations

• Client Interfaces• Share/Explorer or Custom e.g. Web Scripts, CMIS• Typically little CIFS/WebDAV/FTP

11

UC01: Bulk Loading

• Parallel processing• Load nodes simultaneously

• Avoid unnecessary in-transaction processing• In-transaction services often not required when loading

• e.g. Transformation, Indexing

• Disable unneeded services• Many standard services are not required when loading

• Minimise network and file I/O operations• Get source content as close to server storage as possible

• Always benchmark and tune...• JVM, Network, Threads, DB Connections...

Primary Objective is to Maximise Throughput

12

UC01: Bulk Loading

Architectural considerations

• Creation is CPU, memory, network intensive• Always 64 bit• Rule of thumb: Prefer scale up over scale out – simpler deployment and management• Rule of thumb: get the content as close as possible to Alfresco

• Nature of the data set (i.e. batches) is KEY• If batches are sequential -> minimize time-per-batch

• Scale up in CPU and memory• If batches are parallelizable -> maximize number of batches processed

• Scale out with multi-threaded uploads• Consider dedicated server(s) for ingestion

• Use production servers for migration use case and then reconfigure

• Design content storage around your data• How can you get the source content as close as possible to repository content

storage?

• Note: Avoid Sparc T and related series• Highly parallel but not suited for atomic heavy serial operations

15

UC01: Bulk Loading

• 64 bit• Make NewSize as large as

possible to avoid spill over to OldGen

• See http://wiki.alfresco.com/wiki/JVM_Tuning

Tuning best practices - JVM

• Pay attention to the machine capacity i.e.

• Threads• CPU Utilization• I/O

Tuning – Application Server

Sample JVM Config: 64-bit, dual 2.6GHz Xeon / dual-core per CPU , 8GB RAM environment

-server-Xss1M-Xms2G-Xmx3G -XX:NewSize=1G -XX:MaxPermSize=256M

http://wiki.alfresco.com/wiki/JVM_Tuning

16

JVM Example - Visual GC

Bad Good

17

UC01: Bulk Loading

Tuning best practices – I/O

• Network• Alfresco to Database is Key

• Latency is key e.g. > 10ms is absolute max • JDBC fetch size should be 150

• See BP-1_Alfresco_Environment_Validation_and_Day_Zero_Configuration

• Alfresco to storage (if remote)• If possible, avoid it completely for file transfers - Stage content on local disks• Use a dedicated network for storage e.g. Fibre channel

• Incoming to Alfresco – Typically not relevant for bulk loading use case

• Disk• Lucene index operations' are disk I/O intensive

• Fast read/writes i.e. local disk• Avoid indexing if not required

• Avoid unnecessary content file copying• Stage content on local disks• Consider set cm:content property directly e.g.

• contentUrl=store://mypath/mydocument.docx|mimetype=application/vnd.openxmlformats-officedocument.wordprocessingml.document|size=51142|encoding=UTF-8|locale=en_GB_

18

UC01: Bulk Loading

Tuning best practices - Database

• Connections – Relevant if you are loading concurrently• See BP-1_Alfresco_Environment_Validation_and_Day_Zero_Configuration

• DB Indexes & Statistics• Plan your batch loads to allow for periodic statistics maintenance

• Make sure the database hardware/software is sized appropriately e.g.

• Log sizes, flush on transaction commit, cache tuning, lock management....

• Use of multiple physical volumes/RAID....

•All databases provide many options to optimise performance• Get a DB administrator, partner involved

19

UC01: Bulk Loading

Tuning best practice - Repository Services

• Force background indexing• alfresco-global.properties

• Everything: index.tracking.disableInTransactionIndexing=true• Just Content: lucene.maxAtomicTransformationTime=0

• Is content indexing required at all?• DoNotIndex aspect

• “Run As” system user to avoid permission checking

20

UC01: Bulk Loading

Tuning best practice - Repository Services

• Use an optimised custom bulk loader• Process docs in batches - not 1 doc per transaction or 1 transaction for entire content set

• Example: 100 documents per batch• Use Foundation (Java) API if possible

• Design multi-threaded import code• Partition your data set so you can use multiple threads loading in different areas• Scale up CPU accordingly

•Consider direct APIs (e.g. “NodeService” vs “nodeService”)• Public services are heavily wrapped with interceptors for transactions, auditing,

permissions, multilingual translations, etc.

• Disable behaviours• Rules evaluations, cm:auditable, versioning, quotas (system.usages.enabled=false)

•Use proper transaction demarcation• Complete all operations on a node in a single transaction• Batching – group multiple updates in a single transaction• Avoid mixing reads and writes

• See session CS2-Repository_Internals for more details on API specifics

21

UC02: Bulk Loading

Tuning best practices – Repository Services

• Disable modified timestamp propagation to parent folders• system.enableTimestampPropagation=false (default)

• Deleting large numbers of nodes• Skip deleted items (archive) by adding sys:temporary aspect your

content before deletion

• Partition your content within the repository• Depends on read access requirements• Consider partitioning more than 2000 nodes per space if browsing

space children

Note: Performance much improved in later releases 3.3.3, 3.4 – test for your use case

23

UC01: Bulk Loading

Scale Out Using Dedicated Bulk Load Server(s)

• Alfresco can support a non-clustered injection only tier

• Objective: Separate input write process from front end read load

• Solution: Dedicated injection tier pointing to same DB/Content store(s) as front end servers. No need to cluster caches from this tier with the front end. Background index properties and/or content, indexes will catch up from DB transactions.

• Benefits: No Cache update/invalidation overhead. Indexing does not block loading process

24

Bulk Load Architecture Example

Bulk load server(s) not clustered but share storage and DB product servers will ‘catch up’ via index tracking

Production CProduction B

Tomcat

EHCache

Tomcat

EHCache

Database

MySQL

LuceneIndex

LuceneIndex

Production A

LuceneIndex

Tomcat

EHCache

Bulk Load B

LuceneIndex

Tomcat

EHCache

Bulk Load A

LuceneIndex

Tomcat

EHCache

Content Store

Runtime ClientsBulk Load ProcessCreates Only

25

UC01: Bulk Loading

• Bulk Load Server(s)• To exclude servers(s) from cluster:

• Do not set cluster name for bulk load servers in alfresco-global.properties• alfresco.cluster.name=

• Force background indexing in the local alfresco-global.properties using:• Everything:

• index.tracking.disableInTransactionIndexing=true

• Just Content:• lucene.maxAtomicTransformationTime=0

• Note: The load process should perform creates only, no updates or reads

• Production Server(s)• Ensure index tracking is enabled:

• index.tracking.cronExpression=0/5 * * * * ?• index.recovery.mode=AUTO

Load Server(s) Configuration Tips

26

UC01: Bulk Loading

• 10,000 docs, 1,000 folders• 50kb word documents• FTP with 10 sessions• Laptop

• Foreground Indexing:• 33 mins

• Background Indexing:• 5 mins

Example: In-transaction v’s Background Indexing

28


UC02: Enterprise Collaboration Platform

29


Requirements

• High (and potentially highly distributed) user concurrency• 1,000’s -10,000’s users (read & write)• Medium/High number of documents• 10,000-1 million+ documents • 1000 document updates per day

• Complex enterprise content and permission models• Multiple content models/Dynamic ACL• Versioning and full text indexing on all documents• Document types: Office, drawing, images

• Advanced content management• Multiple rules and actions• Heavy use of content transformations/workflow

•Interfaces (All)• Share, WebDAV, CIFS ....

30


Architectural considerations

• Fully fledged platform deployment• Need to consider maintenance window

• Scale out Share independently from Repo• Front and intermediate Load balancer/Web Cache layers• Read/write split and scheduled repository exclusion for maintenance

• Scale out transformation server• Enterprise only: JOD OpenOffice subsystem

• Scale out and up infrastructure• Cluster CIFS with DFS (Distributed File System)• All HTTP based protocols scale seamlessly (SSP on port 7070)

•Balance multi-CPU (scale up) and multi-node clusters (scale out)• Overhead of index tracking

31


Design best practices

• Distribute your content within the repository• Otherwise search and retrieval performance degradation is likely• Use versioning and indexing where appropriate, not just because it’s

there..• e.g. don’t simply apply cm:versionable to the full cm:content

• Modelling• Prefer aspects over types

• Remember aspects support inheritance as well• Content Model indexing options

• Tune what you need to index

• Quotas (aka Usages)• Might save your repo from content explosion but also have an overhead!

32


Tuning best practices – Note: Also see bulk load use case!

• RDBMS• Number of connections much more important for this use case• Formula: HTTP Worker Threads + 75 per cluster node

• For Tomcat defaults this is 275

• Cache Configuration• L2 Cache: increase with RAM to include more objects in cache• Use ehcache tracing tool to indentify which caches have low hit ratios and increase if you have available memory• See http://wiki.alfresco.com/wiki/Repository_Cache_Configuration#Tracing_cache_sizes for details

• Alfresco Configuration optimization• VFS thread pool tuning (default: <threadPool init=“25” max=“50” />)• Tune ACLs and preload common searches (if needed)

system.acl.maxPermissionCheckTimeMillis=10000

system.acl.maxPermissionChecks=10000

Query via node browser as different users, not only admin• Consider bulk load large user bases (10,000s) to single (un-clustered) node and then cluster

• Disable eager home folder creation• home.folder.creation.eager=false in alfresco-globallproperties

• Use multi-threaded and incremental LDAP sync once initial sync has been completed• Differential sync is the default

• Lucene Tuning• Lucene.maxAtomicTransformationTime=20

• Monitor the network performance when adding nodes to a cluster• What for ehcache waiting for the network via thread dumps• Consider disabling some/all of the L2 caches

http://wiki.alfresco.com/wiki/Repository_Cache_Configuration#Tracing_cache_sizes

http://wiki.alfresco.com/wiki/Repository_Cache_Configuration#Tracing_cache_sizes

http://home.folder.creation.eager=false/

http://home.folder.creation.eager=false/

Active Directory

<- Failover ->

alfclustsrv01

Oracle 1

HTTP Load BalancerHTTP Load Balancer

alfappsrv01

Tomcat 1

EHCache

alfclustsrv02

Oracle 2

alfappsrv02

Tomcat 2

EHCacheClustered

SAN

Local alf_data

Local alf_data

• Lucene Index d:\alf_store\lucene-indexes• Content Store d:\alf_store\contentstore In & Outbound Replication to shared content store on SAN

MSCS Cluster

• Shared Content Store: sharedContentStore (\\alfdata\Data\store)• Oracle: - Data (o:\oradata\alfresco), Control (o:\oradata\alfresco) & Logfiles (L:\oradata\alfresco) - Oracle Backup (o:\flash_recovery_area)• Lucene Index Backup (\\alfdata\Hold)

JDBCoraclecluster

User/Group SyncNTLM Authentication

DFS Round RobinDFS Round Robin

HTTP Clientse.g. Share

CIFSvia \\alfrescocifs

• Lucene Index d:\alf_store\lucene-indexes• Content Store d:\alf_store\contentstore In & Outbound Replication

• Replicating Content StoreIn & Outbound replication between local and shared content store

• Replicating Content StoreIn & Outbound replication between local and shared content store

Example Windows ECM ProductionCluster Install- Local & Shared Content Store

34


Replication (3.4) offers new deployment options

• Replication may be appropriate for specific contexts• Provides selective replication of content between distinct Alfresco

repositories• On demand or scheduled via Replication Jobs• Reporting and Tracking of Replication Jobs

• Read and viewing performance: Content is served from a local server

35

Scaling Tips

For any system...

• Do not use the OOTB settings for application server, database etc Alfresco you must always tune for your use case• Balance your resources

• Separate tiers for DataBase, Content, App Servers

• Indexes should always be on fast, local disk e.g. not NFS mounts, USB drives etc• Run on a supported stack e.g.

• e.g. issues with 1.6u10 use JDK 1.6u.20, use MySQL 5.1.39 or later

• Don’t starve your database of connections:• db.pool.max=XXX

• Use appropriate application server worker threads• Configuration details are application server specific e.g. Tomcat: server.xml

• When clustering, use JGroups and Unicast• Use the latest Alfresco version/service pack e.g.

• 3.3.3, 3.4

36

Scaling Tips

Things you should NOT change

• The database transaction isolation level• Use defaults for all databases except MS SQLServer• FYI. SQLServer should be:

• db.txn.isolation=4096• ALTER DATABASE alfresco SET ALLOW_SNAPSHOT_ISOLATION ON;

• The ehcache default configuration i.e. Replicate async• The Lucene indexing defaults unless you know what you are doing and why!• Note: Also do not do a full-index rebuild unless you know what was wrong in the first place!

• Use the index checker

37


Benchmark your solutions

38


Alfresco Benchmarks

• Alfresco Benchmark Tools• alfresco-bm – http://wiki.alfresco.com/wiki/Server_Benchmarks• SimpleInjector – (check partners.alfresco.com)• For CIFS loading -> Jmeter + SMB mount

• Alfresco Benchmark Results• Unisys benchmark results• JCR Benchmarks

• WIP• “Scale your Alfresco Solutions” (in http://partners.alfresco.com)• More Platform benchmark ongoing – watch this space!

http://wiki.alfresco.com/wiki/Server_Benchmarks

http://partners.alfresco.com/

39


Profiling your Alfresco solution

•Alfresco Application Profiling• JMX (for Enterprise Only see Admin Guide)

http://wiki.alfresco.com/wiki/JMX• Audit Surf

http://forge.alfresco.com/projects/auditsurf/• Nagios integration

http://forge.alfresco.com/projects/nagios4alfresco/

• Infrastructure Profiling• VisualVM (JVM)

http://ur.ly/esjZ• Thread Dump Analyzer• https://tda.dev.java.net/• YourKit (JVM)

http://wiki.alfresco.com/wiki/JMX• WireShark (Network)

http://www.wireshark.org/• Mysql Query Profiler (DBMS)

http://dev.mysql.com/tech-resources/articles/using-new-query-profiler.html

http://wiki.alfresco.com/wiki/JMX

http://forge.alfresco.com/projects/auditsurf/

http://forge.alfresco.com/projects/nagios4alfresco/

http://ur.ly/esjZ

http://ur.ly/esjZ

https://tda.dev.java.net/

http://wiki.alfresco.com/wiki/JMX

http://www.wireshark.org/

42


Q/A & Feedback

• Any Questions?• Share your experiences (good and bad) with us so we can all learn!

• Successful scaled up/out architectures• Limitations, bottlenecks• Use case parameters => Implementation => Results• What worked, what didn’t

Technology

Scale your Alfresco Solutions