Functional Overview Item Normalization
- Acquisition: It is the main and first function of Information Retrieval. Acquisition means to collect information from various sources. Firstly the user collects their desired information from various sources. Sources may be Book, document, database, journal etc.
- Contest analysis: Second step of Information retrieval system is to analyze their acquired information, and in this step they may take decision is this document they collect is valuable or not.
- Content presentation: Information presentation is a system for presenting information to the user. Information should be presenting clearly and effectively so that users can be able to understand them very easily. For this purpose catalogue, bibliography, index, current awareness service will help us a lot.
- Creation of file/store: In this stage the library authority creates a new file for storing their collected information, which is ready for presentation. They organize those file by some systematic way.
- Creation of search methods: In this stage the authority decide what kinds of search logic they may use for searching and retrieving information.
- Dissemination: The last stage of Information retrieval system is dissemination. It is the act of spreading information widely. In the stage the library authority disseminate information to the user by a systematic way.
- Catalogue: catalogue is a list of books of one or more library, by which we can identify the location of any books. This contains Author name, Title name, Edition, Number of volume, subject, ISBN etc.
- Index: An index is an alphabetical list at the back of a book saying where particular things are mentioned in the book. It is a very important tool for information retrieval.
- Abstract: An abstract is a concise and accurate representation of the contents of a document. It also serves as a retrieval tools.
- Bibliography: Bibliography is also a list of books, not confined to a particular library. It acts as a retrieval tool.
- Authority file: It is a list of files contains call number and class number without any specific rules. It also an important tools for retrieving information.
- Computer: an electronic machine that can store and work with large amounts of information.
- CD-ROM: a compact disc used as a read-only optical memory device for a computer system.
- Hard Disk: a rigid non-removable magnetic disk with a large data storage capacity.
- Floppy Disk: a flexible removable magnetic disk (typically encased in a hard plastic shell) for storing data.
- Internet: a global computer network providing a variety of information and communication facilities, consisting of interconnected networks using standardized communication protocols.
A total Information Storage and Retrieval System is composed of four major functional
1) Item Normalization
2) Selective Dissemination of Information (i.e., “Mail”)
3) Archival Document Database Search, and an Index
4) Database Search along with the Automatic File Build process that supports Index Files.
The first step in any integrated system is to normalize the incoming items to a standard format. Item normalization provides logical restructuring of the item. Additional operations during item normalization are needed to create a searchable data structure: identification of processing tokens (e.g., words), characterization of the tokens, and stemming (e.g., removing word endings) of the tokens.
Multi-media adds an extra dimension to the normalization process. In addition to normalizing the textual input, the multi-media input also needs to be standardized. There are a lot of options to the standards being applied to the normalization. If the input is video the likely digital standards will be either MPEG-2, MPEG-1, AVI or Real
Media. MPEG (Motion Picture Expert Group) standards are the most universal standards for higher quality video where Real Media is the most common standard for lower quality video being used on the Internet. Audio standards are typically WAV or Real Media (Real Audio). Images vary from JPEG to BMP.
The next process is to parse the item into logical sub-divisions that have meaning to the user. This process, called “Zoning,” is visible to the user and used to increase the precision of a search and optimize the display. A typical item is subdivided into zones, which may overlap and can be hierarchical, such as Title, Author, Abstract, Main Text, Conclusion, and References. The zoning information is passed to the processing token identification operation to store the information, allowing searches to be restricted to a specific zone. For example, if the user is interested in
articles discussing “Einstein” then the search should not include the Bibliography, which could include references to articles written by “Einstein.”
Systems determine words by dividing input symbols into 3 classes:
1) Valid word symbols
2) Inter-word symbols
3) Special processing symbols.