This page last changed on Feb 20, 2008 by martinmueller@northwestern.edu.

Below are the architecture related topics we should discuss and arrive at some sort of consensus.
Please feel free to edit this document.

Architecture related Open Questions

Some of these questions were discussed when everyone was not present in the meeting on 24th of Feb. So for the
benefit of all they are still in the open category. Please add the questions here for discussion.

  1. Do we need a Proxy?

    Justification for a Proxy:

    1. Moving the Business logic for the display/analytics/search etc. to an intermediate system that brokers the requests to proper back end systems.
    2. Provides a Platform for horizontal scaling and fail over.
    3. Provides a choke point for access control. (AK)
  2. Are we targeting web browser as the lowest common denominator ?
  3. Are we building multiple interfaces e.g. Java based/ Flash/GWT based?
  4. Are we building APIs that are open and platform independent ?
  5. Can we provide APIs flexible enough to create Yahoo Pipes like applications and at the same time use these to create wordhoard or Nora-OL like interfaces?

Questions not discussed

  1. Accessibility: Sure with Visualization in the mix, accessibility becomes a difficult topic, but what is the mandate for the
    data cell.
  2. http://www.terracotta.org/ for Large Virtual heaps, useful for large sparse matrices.
  3. Java Cache Solutions OSCache,EHCache http://java-source.net/open-source/cache-solutions etc, at the proxy. How does that compare with the Hibernate cache at the DB end.
  4. Parallel Computing: http://labs.google.com/papers/mapreduce.html
  5. Can we abstract the database functionality to a set of interfaces, that would allow us to experiment with different data models and database technologies.

Use Cases

  1. Where are they? The architecture discussion should not happen in vacuum and it should not be a mere sum of all
    the functionality that both wordhoard and nora provide.

System Components

  1. Client: A web browser or an end user application that makes use of the back end data sources and analysis engine.
  2. Proxy: A middle tier that federates the client requests and is responsible for session/data aggregation and request federation.
  3. Database: A set of indices and or relation/XML database for resolving analytics, relational and XPATH queries.
  4. Analytics Engine: (D2K Server) The backend server that runs various Data mining algorithms.
  5. Workflow Engine: For Text ingestion.

Evaluation Criterion

  1. Response time
  2. Scalability

Client Capability Expectations

  1. What capability do we expect from the clients; - Ability to Sort; HTML rendering; Display Graph/Visualization; Display Images;

testing (text/xml)
Document generated by Confluence on Apr 19, 2009 15:04