This interface documentation has been generated from the base C++ interface with injection of some additional stuff like namespaces and helper classes, that are specific for the Java interface. Unfortunately it contains still some type qualifiers pointing to it's C++ origin. Please don't blame it for that. I'll find other solutions. Suggestions are welcome.
The strus Java bindings provide a Java interface for accessing the retrieval storage, indexing documents and queries and evaluating queries.
The entry point of a strus application with Java is the context (net::strus::api::Context) object. It is the root object from which all other objects are created. All objects of the strus Java API are in the namespace net.strus.api. It can be constructed either as proxy, that redirects all method calls to an RpcServer or it can be constructed as instance running in the Java/JNI environment.
This example shows the creation of the root object Context that accesses the storage directly.
String storageConfig = "path=storage; metadata=doclen UINT16";
net.strus.api.Context ctx = new net.strus.api.Context();
net.strus.api.StorageClient storage = ctx.createStorageClient( config);
This example shows the creation of the root object Context as RPC proxy.
String rpcServer = "localhost:7181";
Context ctx = new Context( rpcServer);
net.strus.api.StorageClient storage = ctx.createStorageClient("");
Create a collection of document (without using the strus analyzer)
In the Java universe there exist a lot of alternatives to analyze or index a document. This example accommodates this and shows how to insert a test collection defined without using the strus analyzer. The terms are assumed to come from "somewhere".
package net.strus.example;
import net.strus.api.*;
import java.io.*;
import java.util.List;
public class CreateCollectionNoAnalyzer
{
public static void insertDoc( StorageTransaction transaction, String docid, String[] searchIndex, String[] forwardIndex, String title)
{
Document doc = new Document();
int pos = 0;
for (String item : searchIndex)
{
++pos;
if (item.length() > 0)
{
doc.addSearchIndexTerm( "word", item, pos);
}
}
pos = 0;
for (String item : forwardIndex)
{
++pos;
if (item != null)
{
doc.addForwardIndexTerm( "orig", item, pos);
}
}
doc.setAttribute( "title", title);
transaction.insertDocument( docid, doc);
}
public static void main( String []args) {
String config = "path=storage; metadata=doclen UINT16";
if (args.length > 0)
{
config = args[ 0];
}
Context ctx = new Context();
try
{
ctx.destroyStorage( config);
}
catch (Exception e)
{
}
ctx.createStorage( config);
StorageClient storage = ctx.createStorageClient( config);
String[] doc_A_searchIndex = {"tokyo","is","a","citi","that","is","complet","differ","than","what","you","would","expect","as","european","citizen"};
String[] doc_A_forwardIndex = {"Tokyo","is","a","city","that","is","completely","different","than","what","you","would","expect","as","European","citizen."};
String doc_A_title = "One day in Tokyo";
String[] doc_B_searchIndex = {"new", "york", "is", "a", "citi", "with", "dimens", "you", "can", "t", "imagine"};
String[] doc_B_forwardIndex = {"New", "York", "is", "a", "city", "with", "dimensions", "you", "can't", null, "imagine"};
String doc_B_title = "A visit in New York";
String[] doc_C_searchIndex = {"when","i","first","visit","germani","it","was","still","split","into","two","part"};
String[] doc_C_forwardIndex = {"When","I","first","visited","germany","it","was","still","splitted","into","two","parts."};
String doc_C_title = "A journey through Germany";
try
{
StorageTransaction transaction = storage.createTransaction();
insertDoc( storage, "A", doc_A_searchIndex, doc_A_forwardIndex, doc_A_title);
insertDoc( storage, "B", doc_B_searchIndex, doc_B_forwardIndex, doc_B_title);
insertDoc( storage, "C", doc_C_searchIndex, doc_C_forwardIndex, doc_C_title);
transaction.commit();
}
catch (Exception e)
{
System.err.println( "Failed to insert documents: " + e.getMessage());
return;
}
System.out.println( "done");
}
}
Create a collection of documents (with strus analyzer)
Now we show an example doing the same as the previous one, but reading its documents to process from files and processing them with the strus document analyzer.
package net.strus.example;
import net.strus.api.*;
import java.io.*;
import java.util.List;
public class CreateCollection
{
static String readFile( String path) throws IOException {
try (BufferedReader br = new BufferedReader(new FileReader(path)))
{
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
return sb.toString();
}
}
public static void main( String []args) {
String config = "path=storage; metadata=doclen UINT16";
if (args.length > 0)
{
config = args[ 0];
}
Context ctx = new Context();
try
{
ctx.destroyStorage( config);
}
catch (Exception e)
{
}
ctx.createStorage( config);
StorageClient storage = ctx.createStorageClient( config);
DocumentAnalyzer analyzer = ctx.createDocumentAnalyzer();
{
Tokenizer word_tokenizer = new Tokenizer( "word");
Tokenizer split_tokenizer = new Tokenizer( "split");
Tokenizer content_tokenizer = new Tokenizer( "content");
NormalizerVector stem_normalizer = new NormalizerVector(3);
stem_normalizer.set( 0, new Normalizer( "stem", "en"));
stem_normalizer.set( 1, new Normalizer( "lc"));
stem_normalizer.set( 2, new Normalizer( "convdia", "en"));
NormalizerVector orig_normalizer = new NormalizerVector(1);
orig_normalizer.set( 0, new Normalizer( "orig"));
analyzer.addSearchIndexFeature( "word", "/doc/text()", word_tokenizer, stem_normalizer);
analyzer.addForwardIndexFeature( "orig", "/doc/text()", split_tokenizer, orig_normalizer);
analyzer.defineAttribute( "title", "/doc/title()", content_tokenizer, orig_normalizer);
}
try
{
StorageTransaction transaction = storage.createTransaction();
String datadir = "./data/";
File folder = new File( datadir);
File[] listOfFiles = folder.listFiles();
for (File file : listOfFiles) {
if (file.isFile())
{
String filename = file.getName();
if (filename.endsWith( ".xml"))
{
String docid = filename.substring( 0, filename.length()-4);
Document doc = analyzer.analyze( readFile( datadir + filename));
transaction.insertDocument( docid, doc);
}
}
}
transaction.commit();
}
catch (Exception e)
{
System.err.println( "Failed to read all input files: " + e.getMessage());
return;
}
System.out.println( "done");
}
}
Retrieve a ranklist with a simple query consisting of some terms (without analyzer)
The query evaluation scheme used for ranking the results is BM25. The policy that decides what to return is defined by a selection expression that matches documents that contain all of the query terms. So we search for documents that contain all query terms and rank them with BM25:
package net.strus.example;
import net.strus.api.*;
import java.io.*;
import java.util.List;
public class QueryNoAnalyzer
{
public static RankVector evaluateQuery( StorageClient storage, QueryEval queryEval, String[] terms)
{
Query query = queryEval.createQuery( storage);
QueryExpression selectexpr = new QueryExpression();
for (String term : terms) {
QueryExpression expr = new QueryExpression();
expr.pushTerm( "word", term);
selectexpr.pushTerm( "word", term);
query.defineFeature( "seek", expr, 1.0);
}
selectexpr.pushExpression( "contains", terms.length);
query.defineFeature( "select", selectexpr);
query.setMaxNofRanks( 20);
query.setMinRank( 0);
return query.evaluate();
}
public static QueryEval createQueryEval( Context ctx) {
QueryEval queryEval = ctx.createQueryEval();
queryEval.addSelectionFeature( "select");
WeightingConfig weighting = new WeightingConfig();
weighting.defineParameter( "k1", 0.75);
weighting.defineParameter( "b", 2.1);
weighting.defineParameter( "avgdoclen", 1000);
weighting.defineFeature( "match", "seek");
queryEval.addWeightingFunction( 1.0, "BM25", weighting);
SummarizerConfig sum_title = new SummarizerConfig();
sum_title.defineParameter( "name", "title");
queryEval.addSummarizer( "title", "attribute", sum_title);
SummarizerConfig sum_match = new SummarizerConfig();
sum_match.defineParameter( "type", "orig");
sum_match.defineParameter( "nof", 4);
sum_match.defineParameter( "len", 60);
sum_match.defineFeature( "match", "seek");
queryEval.addSummarizer( "summary", "matchphrase", sum_match);
return queryEval;
}
public static void main( String[] args) {
if (args.length == 0)
{
args = new String[1];
args[0] = "citi";
}
Context ctx = new Context();
String config = "path=storage";
StorageClient storage = ctx.createStorageClient( config);
QueryEval queryEval = createQueryEval( ctx);
RankVector results = evaluateQuery( storage, queryEval, args);
System.out.println( "Number of results: " + results.size());
int pos = 0;
for (Rank result : results)
{
++pos;
System.out.println( "rank " + pos + ": " + result.docno() + " " + result.weight() + ":");
RankAttributeVector attributes = result.attributes();
for (RankAttribute attribute : attributes)
{
System.out.println( "\t" + attribute.name() + ": " + attribute.value());
}
}
System.out.println( "done");
}
}