|
Data Warehousing in Relation to LBS
Representation of Geo-Related Content in Location-Based Services
Efficient Update and Advanced Querying of Geo-Related Content
Data Warehousing in Relation to LBS
It is important for the success of a business to secure increased customer loyalty and lower customer
turnover rates (known as churn-rates). Thus, it is of interest to create improved knowledge of the factors
that influence the churn-rate - this leads to better possibilities for reacting to customer behavior.
For example, pricing models for mobile phone usage may be designed based on detailed knowledge about customer
behavior.
Mobile services of the future are expected to know and use the customers' needs and usage patterns.
Today, project participant Sonofon has a service called Mobilportalen, the customers' gateway to the mobile
Internet. This service may be developed to use the customer location, the time of day, and the previous
queries by the user to offer services that are useful to the customer here and now. Sonofon is already
logging information about the customers' use of cell phones, e.g., locations at dial-up, call frequency,
call length as well as administrative customer data, providing a solid foundation for substantial Business
Intelligence analyses for both short- and long-term decisions.
Existing data models used in Business Intelligence (BI) only support relatively simple data structures
that are unable to capture the complex nature of advanced data types such as geographical data and sequential
web- and tele-logs. The challenges in modeling geo-related data are both the representation of data as
graphs, maps, and coordinates, and the imprecision that the data representation has, due to imprecision
of positioning techniques, the variable temporal validity, and the varying level of detail used for positioning
in different areas. The need for efficient processing of advanced queries must also be taken into account.
BI analyses have typically been used for traditional business analysis such as sales by product, area,
price, customer type, etc, However, the mobile telephony business has huge and rapidly growing amounts
of data that can support more advanced analyses. Long-term analyses are used for consolidation and enlargement
of business areas. Short-term analyses are used in sales and call-centers, where the information is used
to offer the customer exactly the offers and services that will sustain or increase customer satisfaction.
Finally, mobile service analyses can be used to customize services on-the-fly based on the customer's
actual situation and past behavior.
The above-mentioned analyses typically require fast response times. A well-known solution to this is
to use precomputed data for faster response times. However, precomputation has so far only been used to
support relatively simple queries. The future use for the complex, dynamic data types found in location-based
services render the known methods for precomputation inapplicable. The project aims to develop new methods
that can handle both complex and dynamic data.
Representation of Geo-Related Content in Location-Based Services
In order to being able to effectively and efficiently offer new location-based services, it is important
to avoid a software development strategy and software architecture where a new, monolithic, stove-pipe-like
system is developed for each new service. With such systems, there is little reuse when a new service
is developed.
To obtain reuse of data across location-based services, an integrated representation, or data model,
of all relevant geo-referenced content will be developed in the project. Such a data model will promote
reuse of content and lower-level services when new location-based services are developed.
Project participant Euman has years of experience with the Danish (and Nordic) transportation infrastructure.
The Danish Road Directorate maintains more than 1000 attribute values for each position on each road in
Denmark. A substantial fraction of these need to be reflected in an integrated data model. In addition
to these attributes, which are closely related to the roads themselves, the data model must capture the
``real content,'' which is much more voluminous and open-ended. For example, such content includes information
about stores, e.g., their opening hours, available inventory, and current sales, and about cultural events,
e.g., the artists, attendance prices, and seat availability. While most geo-related content is stationary
or changes location only at discrete times, some content changes continuously. The locations of the service
users is an example of the latter. Such content must also be captured in the data model.
As a futher complicating factor, it turns out that it is beneficial to maintain at least two types
of representations of the same geo-referenced content: a two-dimensional native space representation where
coordinates are associated with the content, and a representation of the geo-referenced content in multiple
one-dimensional spaces determined by the existing transportation network.
Yet another complication occurs because uncertainty is inherent to all geo-related content and must
be taken into account. For example, user locations are sampled according to a variety of protocols. Due
to the sampling, complete traces of the users' movements are unavailable; rather, the service only knows
the locations of the users at discrete times. Additionally, the samples themselves are imprecise. The
sample imprecision is dependent on the technology used and the circumstances under which a specific technology
is used.
If a precise (but incorrect) trace is maintained for each user, queries may return suboptimal results.
On the other hand, if an overly imprecise record of the positions is kept, query results will also be
suboptimal. Maintaining a very accurate record of each user's trace will yield the best query results,
but may also lead to poor query performance and large volumes of updates. Thus, the model must maintain
a representation that is adequately precise, and it must be able to maintain content with different precisions.
The envisioned data model is both conceptually simple and also permits the provision of efficient services,
which involves the processing of large loads of updates and complex queries.
Efficient Update and Advanced Querying of Geo-Related Content
In location-based service scenarios that involve large amounts of content, that rely on access to up-to-date
information, and where continuous variables (e.g., the locations of users) are monitored, updates represents
a very substantial challenge. Due to the volumes of data, the data must be assumed to be disk resident;
and to obtain adequate query performance, some form of indexing must be employed. Existing hardware and
software solutions for this type of scenario can accommodate relatively few updates. This presents a serious
problem for location-based services (and other services that rely on the monitoring of continuous variables
via some forms of sensors).
Advanced queries include, e.g., different types of nearest-neighbor queries, monochromatic and bichromatic
reverse nearest neighbor queries, and queries that retrieve travel plans based on arbitrary content. In
location-based services, it is natural to not simply compute such queries once, but instead to activate
such queries and then update the results when the underlying data (locations of users and other relevant
content) changes.
Traditional indexing techniques necessitate explicit index updates when changes occur in the data.
This renders the use of indices for moving objects either impractical or totally impossible. Two general
approaches may be taken towards accommodating continuous change in the indexed data. Techniques may be
applied that (i) create less updates, or (ii) the existing techniques may be enhanced to support rapid,
non-bulk update. Techniques for advanced query processing, and that exploit indexing techniques, are largely
unexplored. The use of approximation techniques in relation to both updates and queries also represents
a very interesting direction.
Top
|