Information Mining Worker (傅顺开)

Sunday, July 27, 2008

Roboo: A promising mobile search engine in China

Welcome to visit us at www.roboo.com and wap.roboo.com (using Opera browser)

WSDM 2009 CFP

2nd International Web Search and Data Mining (WSDM) Conference calls for paper:
- Abstract due date: Aug 04, 2008
- Full paper due date: Aug 11, 2008

Friday, July 04, 2008

Workshops: 10th October 2008
Papers: 31st October 2008
Tutorials: 30th November 2008
Panels: 21st December 2008
Posters: 11th January 2009
Developers track: 2nd February 2009

Conference date: 20th-24th April 2009

HCIR 2008 CFP

Workhop on Human-Computer Interaction and Information Retrieval

Papers/abstracts due: August 22, 2008
Decisions to authors: September 12, 2008
Camera-ready copy due: October 3, 2008

Location: Redmond, WA, USA

Tuesday, July 01, 2008

Recent paper work

One is accepted by International Conference on Data Mining 2008;
Another is accepted by ACM SIGIR Workshop on Mobile Search 2008;
Submit one to IEEE ICDM 2008.

Looking for a new one!!

Thursday, June 12, 2008

MIR'08 CFP

ACM International Conference on Multimedia Information Retrieval (MIR)

Submission deadline: June 20, 2008
Acceptance notification date: July 14, 2008
Conference date: October 30-31, 2008
Conference venue: Vancouver, Canada

Thursday, June 05, 2008

SDM 2009 CFP

9th SIAM International Conference on Data Mining calls for paper.

submission deadline: 2008.10.03
conference date: 2009.04.30
conference venue: Sparks, NV, USA

Friday, May 02, 2008

PAKDD'09 CFP

The 13th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-09) is a major international conference in the areas of data mining and knowledge discovery. It provides an international forum for researchers and industry practitioners to share their new ideas, original research results and practical development experiences from all KDD-related areas including data mining, data warehousing, machine learning, databases, statistics, knowledge acquisition and automatic scientific discovery, data visualization, causal induction and knowledge-based systems. The conference website is at http://www.pakdd2009.org.

Location: Bangkok Thailand

09 September 2008 Abstract Submission
16 September 2008 Paper Submission
29 September 2008 Workshop Proposal
17 November 2008 Tutorial Proposal
28 November 2008 Tutorial Notification
08 December 2008 Author Notification
09 January 2009 Camera Ready
27 April 2009 Conference 27 - 30 April 2009

Sunday, April 27, 2008

Topic Detection and Tracking

1) http://www.itl.nist.gov/iaui/894.01/tests/tdt/
2) http://projects.ldc.upenn.edu/TDT/

Monday, April 21, 2008

Content-based Image Retreival(CBIR)

Most image retrieval today rely on metadata such as captions or keywords, which actually is text-based retrieval.

"Content-based" means that the search will analyze the actual contents of the image. The term 'content' in this context might refer to colors, shapes, textures, or any other information that can be derived from the image itself.

Potential uses for CBIR include:

Art collections

Photograph archives

Retail catalogs

Medical diagnosis

Crime prevention

The military

Intellectual property

Architectural and engineering design

Geographical information and remote sensing systems

Query techniques:

Query by example. An example image is provided to the CBIR system, and the underlying search engine returns imags sharing common elements with the provided example.This query technique removes the difficulties that can arise when trying to describe images with words.

Semantic retrieval. The user makes a request like "find pictures of dogs" or even "find pictures of Abraham Lincoln", which is quite difficult for computer to perform. Current CBIR systems generaly make use of lower-level features like textures, color, and shape, although some systems take advantage of very common higher-level features like faces. Not every CBIR system is generic. Some systems are designed for a specific domain.

Content comparison techniques:

Color. It retrieves images based on color similarity, e.g. by computing a color histogram for each image that identifies the proportion of pixels within an image holding specific values. This is one of the most widely used techniques because it does not depend on image size or orientation.

Texture. It look for visual patterns in images and how they are spatially defined. Textures are represented by texels which are then placed into a number of sets, depending on how many textures are detected in the image. These sets not only define the texture, but also where in the image the texture is located.

Shape. It refers to the shape of a particular region that is being sought out. Shapes will often be determiend first applying segmentation or edge detection to an image.

Lustre File System

What is Lustre?

Lustre is a scalable, secure, robust, highly-available cluster file system. It is designed, developed and maintained by Sun Microsystems, Inc.

The central goal is the development of a next-generation cluster file system which can serve clusters with 10,000's of nodes, provide petabytes of storage, and move 100's of GB/sec with state-of-the-art security and management infrastructure.

Lustre runs on many of the largest Linux clusters in the world, and is included by Suns's partners as a core component of their cluster offering (examples include HP StorageWorks SFS, and the Cray XT3 and XD1 supercomputers). Today's users have also demonstrated that Lustre scales down as well as it scales up, and runs in production on clusters as small as 4 and as large as 25,000 nodes.

Reference Resource:

Lustre wiki on Sun
Sun's official entrance

WebKDD 2008 CFP

10th SIGKDD Workshop on Web Mining and Web Usage Analysis (WEBKDD'08)

submission deadline: 2008. 5. 26
conference date: 2008. 8. 24 - 8. 27
conference venue: Las Vegas, NV, USA

Monday, April 14, 2008

ProActive: A powerful middleware for cluster computing

ProActive is a middleware for parallel, distributed and multi-threaded computing. It provides a comprehensive framework and programming model to simplify the programming and execution of parallel applications: within multi-core processors, distributed on LAN, on clusters and data centers, on intranet and Internet Grids.

The core part of ProActive is its Active Object Model. Programming on ProActive is primarily dealing with with active objects. A distributed or concurrent application built using ProActive is composed of a number of active objects.

Saturday, April 12, 2008

WI 2008 CFP

Overview
The 2008 IEEE/WIC/ACM International Conference on Web Intelligence(WI'08) (WI-08) will be jointly held with the 2008 IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT-08). The IEEE/WIC/ACM 2008 joint conferences are organized by University of Technology Sydney, Australia, and sponsored by IEEE Computer Society Technical Committee on Intelligent Informatics (TCII), Web Intelligence Consortium (WIC), and ACM-SIGART.

Important Dates
Workshop proposal submission: April 10, 2008
Electronic submission of full papers: July 10, 2008
Tutorial proposal submission: July 10, 2008
Workshop paper submission: July 30, 2008
Notification of paper acceptance: September 3, 2008
Camera-ready of accepted papers: September 30, 2008
Workshops: December 9, 2008 Conference: December 9 - 12, 2008

Thursday, March 06, 2008

Hard vs. fuzzy clustering

"Data clustering is the process of dividing data elements into classes or clusters so that items in the same class are as similar as possible, and items in different classes are as dissimilar as possible. Depending on the nature of the data and the purpose for which clustering is being used, different measures of similarity may be used to place items into classes, where the similarity measure controls how the clusters are formed. Some examples of measures that can be used as in clustering include distance, connectivity, and intensity.

In hard clustering, data is divided into distinct clusters, where each data element belongs to exactly one cluster. In fuzzy clustering, data elements can belong to more than one cluster, and associated with each element is a set of membership levels. These indicate the strength of the association between that data element and a particular cluster. Fuzzy clustering is a process of assigning these membership levels, and then using them to assign data elements to one or more clusters." - wikipedia

Comparing with hard clustering, fuzzy clustering has more applications due to its flexibility and the nature of decision making.

Tuesday, February 26, 2008

IEEE SMC 2008 CFP

IEEE SMC 2008 will be hold in Singapore, from Oct 12 to 15.

Important dates:

Submission due of special session proposals
March 2, 2008 (Sunday)
Submission due of papers (full-length papers only)
March 16, 2008 (Sunday)
Notification of acceptance/rejection
May 15, 2008 (Thursday)

Hypertable

Hypertable is one open source project from Google. It is for the design and implementation of a high performance, scalable, distributed storage and processing system for structured and unstructured data.

By now, the latest version is 0.9.0.3-alpha

Tuesday, February 19, 2008

Some references for MapReduce(from Wikipedia)

Papers
"MapReduce: Simplified Data Processing on Large Clusters" — paper by Jeffrey Dean and Sanjay Ghemawat; from Google Labs
"Interpreting the Data: Parallel Analysis with Sawzall" — paper by Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan; from Google Labs
"Google's MapReduce Programming Model -- Revisited" — paper by Ralf Lammel; from Microsoft
"Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters" — paper by Hung-Chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker; from Yahoo and UCLA; published in Proc. of ACM SIGMOD, pp. 1029--1040, 2007. (This paper shows how to extend MapReduce for relational data processing.)
FLuX: the Fault-tolerant, Load Balancing eXchange operator from UC Berkeley provides an integration of partitioned parallelism with process pairs. This results in a more pipelined approach than Google's MapReduce with instantaneous failover, but with additional implementation cost.

Articles
"How Google Works - Reducing Complexity" — article from Baseline magazine
"Can Your Programming Language Do This?" — article from the Joel on Software weblog
Nutch MapReduce — article about MapReduce in Nutch from Tom White's weblog
Cat MapReduce — article about MapReduce in Cat from the Cat project wiki.
"Simple Map Reduce in Ruby" - article about using SimpleMapReduce on Ruby's Rinda which uses DrbRuby
"MapReduce: A major step backwards" - column about advances in database technology compared to MapReduce.

Software
Hadoop — open source MapReduce implementation from Apache
IBM MapReduce Tools for Eclipse — a plug-in that supports the creation of MapReduce applications within Eclipse.
QtConcurrent Open Source C++ MapReduce (non-distributed) implementation from Trolltech
Skynet Ruby Map/Reduce Framework
Retrieved from "http://en.wikipedia.org/wiki/MapReduce"

Saturday, February 16, 2008

adMyself

Just create a page about myself using google page creator.

Sunday, February 03, 2008

IEEE ICDM'08 CFP

[Overview]
The 2008 edition of the IEEE International Conference on Data Mining series (ICDM 2008) will be held in Pisa, Italy, on December 15 thru 19, 2008.
The International Conference on Data Mining series (ICDM) is well established as a top ranked research conference in data mining, providing a premier forum for presentation of original research results, as well as exchange and dissemination of innovative, practical development experiences.
The conference covers all aspects of data mining, including algorithms, software and systems, and applications. In addition, ICDM draws researchers and application developers from a wide range of data mining related areas such as statistics, machine learning, pattern recognition, databases and data warehousing, data visualization, knowledge-based systems, and high performance computing. By promoting novel, high quality research findings, and innovative solutions to challenging data mining problems, the conference seeks to continuously advance the state-of-the-art in data mining. Besides the technical program, the conference will feature workshops, tutorials, panels and, new for this year, the ICDM data mining contest.

[Important Dates]
July 7, 2008 Deadline for paper submission
September 15, 2008 Notification to authors
October 7, 2008 Deadline for camera-ready copies
December 15 – 19, 2008 Conference