Created in: 2006-04-21 19:06:22
Author: martin
Size: 14303 bytes
Last updated: 2006-04-21 19:06:22
This guide will help you to configure your jLibrary clients and servers to achieve optimal performance.
jLibrary server is currently based on Apache Jackrabbit, the reference implementation of the JSR-170. As with Apache Jackrabbit, to configure jLibrary you have to edit the repository.xml file. You can find this file on the WEB-INF/lib/repository/repository.xml location. The following sections will describe the different configuration possibilities within jLibrary.
Configure different storage locations.
On the jLibrary configuration file there are several locations in which you will have to define paths to directories in which content will be stored. For example, the he first section of the jLibrary configuration allows you to change the location on the file system where the jLibrary repositories system data will be created. You can see it here:
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> <param name="path" value="${rep.home}/repository"/> </FileSystem>As you can see, this section (and many others on the configuration file) is using a special variable ${rep.home}. The real location on the file system for this variable depends on the applaction server in which you are running jLibrary. For example, on Apache Tomcat it will be the bin directory. Proabably you really won't like to store contents on that place. If this is the case, you can replace all the occurrences of the ${rep.home} with any other custom location. Even more, you can use different locations for different things. This is a sample with the location changed:
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> <param name="path" value="/home/jlibrary/repository"/> </FileSystem>These are all the places in which the ${rep.home} variable appears and that you can easily replace by other locations:
Configuring user access.
jLibrary access can be configured through JAAS login modules. By default, the access to jLibrary repositories will be managed automatically by jLibrary. This means that jLibrary users and groups will be stored directly within jLibrary repositories. To use this schema, jLibrary uses the SimpleLoginModule:
<LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule"> <param name="anonymousId" value="anonymous" /> </LoginModule>This is a good solution, but probably if you already have your own users directory, or some users database, or whatever other solution, you would be very interested on using it. Fortunately, you can change the LoginModule configuring just as you want. This is an example with jboss properties-based login module:
<LoginModule class="org.jboss.security.auth.spi.UsersRolesLoginModule" <param name="usersProperties" value="users.properties" /> <param name="unauthenticatedIdentity" value="nobody" /> </LoginModule>Ok. Depending on the LoginModule you will need more or less parameters. jLibrary, has chosen to not provide any LoginModule with the server just because we think that is not our business, and they will become hard to maintain, and by consequence a drawback for our users. Fortunately, most application servers come with several predefined LoginModules ready to use that you can take advantage of; also another option is to create your own LoginModule, something that is not really a hard task.
Configuring databases
By default, jLibrary comes configured to use an internal embedded Apache Derby database. This is configured through a persistence manager that is located on the workspaces section:
<PersistenceManager class="org.apache.jackrabbit.core.state.db.DerbyPersistenceManager"> <param name="url" value="jdbc:derby:${wsp.home}/db;create=true"/> <param name="schemaObjectPrefix" value="${wsp.name}_"/> <param name="externalBLOBs" value="false"/> </PersistenceManager>On the persistence manager configuration you can find several interesting parameters:
Following this little summary, you could for example create a persistence manager for a MySQL database with the following lines:
<PersistenceManager class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager"> <param name="driver" value="com.mysql.jdbc.Driver"/> <param name="url" value="jdbc:mysql://localhost/jlibrary"/> <param name="schema" value="mysql"/> <param name="schemaObjectPrefix" value="${wsp.name}_"/> <param name="externalBLOBs" value="false"/> <param name="user" value="martin"/> <param name="password" value=""/> </PersistenceManager>
Using DataSources
As you can see, the above definition uses a hardcoded JDBC definition. This also has another important consequence. When you use the SimpleDbPersistenceManager, you will be sharing a single connection to with all the jLibrary users, and by consequence the system won't scale up very well.
Fortunately, jLibrary allows you to work with your application server DataSources. Working directly with the application server DataSource has a lot of benefits, starting from easy administration and ending with better scalability and performance. To use an external DataSource you must define a JNDIDatabasePersistenceManager instead of the one above:
<PersistenceManager class=" org.apache.jackrabbit.core.state.db.JNDIDatabasePersistenceManager"> <param name="dataSourceLocation" value="jdbc/MyDataSource"/> </PersistenceManager>
Configuring a DataSource on your application server is outside the scope of this document. Refer to the doccumentation of your application server to get more information about that topic.
Tuning the search index
jLibrary uses Apache Lucene for indexing all the repository contents. We are proud of having contributed the original jLibrary text filter classes to Apache Jackrabbit. The search index can be configured on the SearchIndex section:
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex"> <param name="path" value="${wsp.home}/index"/> <param name="textFilterClasses" value="org.apache.jackrabbit.core.query.MsExcelTextFilter, org.apache.jackrabbit.core.query.MsPowerPointTextFilter, org.apache.jackrabbit.core.query.MsWordTextFilter, org.apache.jackrabbit.core.query.PdfTextFilter, org.apache.jackrabbit.core.query.HTMLTextFilter, org.apache.jackrabbit.core.query.XMLTextFilter, org.apache.jackrabbit.core.query.RTFTextFilter, org.apache.jackrabbit.core.query.OpenOfficeTextFilter" /> <!-- These are all default values. You can change them if you want --> <param name="useCompoundFile" value="true"/> <param name="minMergeDocs" value="100"/> <param name="volatileIdleTime" value="3"/> <param name="maxMergeDocs" value="100000"/> <param name="mergeFactor" value="10"/> <param name="bufferSize" value="10"/> <param name="cacheSize" value="1000"/> <param name="forceConsistencyCheck" value="false"/> <param name="autoRepair" value="true"/> <param name="analyzer" value="org.apache.lucene.analysis.standard.StandardAnalyzer"/> <param name="queryClass" value="org.apache.jackrabbit.core.query.QueryImpl"/> <param name="idleTime" value="-1"/> <!-- end of default values --> <param name="respectDocumentOrder" value="false"/> </SearchIndex>You can change some values to try to tune for the best performance. For example, you can disable some of the text filter classes or change them by your own classes if you find them slow, or you can try to increase the buffer and cache sizes, etc. You can also provide your own Lucene analyzer if you want, for example you could add an analyzer for your language, or a snowball analyzer. Also, by default, jLibrary does not respect the document order returned by Lucene, this gives better performance.
Versioning configuration
On jLibrary, version metadata and contents are stored separated from the repositories metadata and content to give best performance. You can configure a different FileSystem and a different persistence manager for the version storage if you want. That is the objective of the Versioning section of the configuration file:<Versioning rootPath="${rep.home}/versions"> <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> <param name="path" value="${rep.home}/versions"/> </FileSystem> <PersistenceManager class="org.apache.jackrabbit.core.state.db.DerbyPersistenceManager"> <param name="url" value="jdbc:derby:${rep.home}/versions/db;create=true"/> <param name="schemaObjectPrefix" value="versions_"/> <param name="externalBLOBs" value="false"/> </PersistenceManager> </Versioning>
Changing system storage
Probably you should not change the system search index configuration, but if you want you can. On the bottom part of the configuration file you will find another search index element:<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex"> <param name="path" value="${rep.home}/repository/index"/> </SearchIndex>This search index is used to store system information, that is internal information for Jackrabbit. If you want, you can remove this search index section and then that information won't be indexed, and probably you will have better performance, but be careful, note that versioning contents are indexed on this system repository, and so if you remove it you won't be able to search info on document version history. Anyways, this is a feature not used by jLibrary, but maybe someone could be interested in. It's your choice.
Even that most customization and tuning features are located on the server side, you can also made a few tricks on the client side to achieve better performance. Each of this tuning parameters must be applied on each single jLibrary repository. To customize a repository, you must first open the repository editor doing double click on a given repository. Then you can go to the Advanced section and you will see the different tuning options.
Document metadata extraction
If you uncheck the Automatically extract document metadata then jLibrary won't try to extract the metadata content of the documents. This can highly improve the performance of the client, specially when you have to add hundreds or thousands of documents at a time. The metadata extraction process can take some seconds, specially on big files, so disabling it can provide a huge performance increase.
Phisical document deletes
If you disable phisical document delete operations, then each time you remove a document from jLibrary that document won't be really deleted from the repository and it will remain forever within the repository content. This option is only useful when you have some restrictions (maybe by law orders) that force you to maintain document history for several months or years. Note, that if you do not have this restriction, then you should never have checked this checkbox as is always better to phisically delete documents from repositories because removing them liberate space and make repository indexes lighter.
Lazy load nodes
This is clearly the option than can give you more performance. If you check the lazy checkbox then jLibrary will load the repository nodes lazily. On lazy mode, jLibrary will only request the nodes that really needs to work. For example, the first time you load a repository jLibrary will only fetch the repository root from the server. Next, when you open a node, jLibrary will only fetch the node's children from the server. On the other hand, if you disable lazy mode, then jLibrary will try to download all the repository structure at once. This option is really good when you are working with small repositories, but when you work with big repositories then this can really decrease the system scalability and so lazy mode is the recommended way to work with jLibrary.