Technology

Newstin is an innovative technology that incorporates a completely new approach to content organization. Newstin technology and its service-oriented architecture is the foundation of a unique system that features fully scalable real-time semantic, multi-language and cross-language document categorization. Newstin patented technology has the potential to become the core platform for organizing any unstructured textual data, including data from all sources on the Internet and potentially including the hidden Web.

Newstin is a powerful engine which harnesses a variety of cutting-edge technologies and implements linguistic processing with semantic analysis, multilevel content categorization and cross-language taxonomy structures. The applications of Newstin technology utilize an inherent capability to make use of context in addition to conventional key word approaches.

Newstin is the largest news database/catalogue in the world currently comprising about 100 Million documents and 5 Billion metadata items and constantly growing. Newstin article collection is continuously updated from over 160,000 global and weighted sources selected from a pool of over 5.5 Million preprocessed sources in 11 languages. Daily up to 250,000+ articles are fully processed into 1.25 Million categories in 15 supported editions: US, UK, Indian, French, German, Italian, Spanish, Mexican, Portuguese, Brazilian, Czech, Russian, Arabic, Chinese and Korean; with more languages and editions coming soon.

Newstin is a complex system incorporating content retrieval, metadata processing, analysis and visualization. The extensive operation behind Newstin makes it a perfect platform for SaaS solutions. Newstin is a bi-directional application of its own. By imposing order on unstructured data Newstin leverages its own extensive metadata collection for business intelligence and enterprise performance management. It is inevitable to organize content first to maximize knowledge mining capability.

Topics and Taxonomy

Newstin document processing is based upon a detailed structure of topics/categories hierarchically organized into taxonomies developed by a team of information specialists. At present, there are over one Million topic nodes in the Newstin taxonomies covering the entire information spectrum. By topic nodes we understand general themes / areas of interest such as Politics, Business or Sports with incremental degrees of specification from, for example, Information Technologies or Soccer down to highly specific elements such as News on graphic cards or News about a specific soccer club or even player. Newstin proprietary taxonomies are fully in-house developed and have been built in a cross-language configuration from the very beginning. This way, Newstin taxonomies carry no legacy burdens that are common with taxonomies from other productions.

Cross-language Information Retrieval

Within any of the Editions, users can activate the Cross-Language Function which provides instant access to articles within any category, but in other languages. For example, while being in the category Energy in the US Edition, the user can click on one of the national flag icons to view articles about energy issues written in nine other languages e.g. French, German, Russian or even Arabic. The cross-language function enables selection of a topic of interest in the user’s language of choice and with a single click, provides access to information on the same topic in various foreign languages which the user would never be able to find otherwise. For the user’s convenience Newstin.com also provides free machine translation (a third party service), for content retrieved using the cross-language function.

Contextual Information Retrieval

For any given category and each search query executed, Newstin.com displays additional contextual information in the form of hyperlinks to categories that currently share contextual traits with the category currently being viewed. Grouping contextual hyperlinks by type (topical category, person, organization, region and company), makes information retrieval using Newstin.com even more convenient and helps with knowledge discovery. Contextual hyperlinks are dynamically evaluated for each specific situation and updated on-the-fly so that users can navigate according to the latest developments relating to specific items of news. As for search results, contextual hyperlinks enable users to go directly to the relevant categories identified instead of reading through conventional search results.

Similarity Grouping

Newstin Similarity module is an application of tf-idf combined with linguistic processing, which ensures accuracy. Every article is analyzed for patterns of most significant terms, which are compared with all available articles and consequently groups of Related Articles are compiled and returned. Grouping of Related Articles has very advantageous benefits in so far as the process identifies common traits in content from many sources and thus indicates which news stories and topics are receiving extensive global coverage. The advantage for users viewing output from Newstin is that they do not need to sift through duplicate information and full coverage is immediately available should they need it. Another application of Similarity Grouping is sorting output according to the number of articles included in one similarity group. The larger the group of similar articles, the more media sources covered the particular content. In other words, the larger the amount of articles written on a given theme or news item, the larger the global coverage, thus indicating the level of importance/interest of the given item. 2.1.5. Information Sources and Source Discovery An independent Source Discovery module identifies information sources and classifies them according to a number of criteria including language, quality, genre (news, blogs and press releases) and audience location. In addition to visualization/listing of categorized content, source classification metadata is also utilized in the content gathering process. All newly identified sources are evaluated for suitability and quality to ensure relevancy and quality of Newstin metadata. A set of carefully selected sources is used to gather content for processing and publication on Newstin.com. This set is continuously updated in order to reflect the dynamics of the internet environment and in response to user feedback. A very large number of information sources are needed to fully leverage the detailed Newstin taxonomy structure. In order to cater for deeper, more specific need of users, a larger number of focused sources must be tracked. At present sources delivered via RSS and ATOM feeds are supported. Newstin is also capable to processes sources recommended by users, enabling inclusion of suitable individual sources and considering user feedback.

Sentiment Analysis

Sentiment Analysis is a feature developed by Newstin to indicate the overall tone of an article or article collection. Sentiment Analysis indicates whether content has a positive, negative or neutral feel to it. For example, an article about an oil spillage at sea would most likely contain negative wording due to the effects of the incident. Information about sentiment is an inherent part of metadata that Newstin produces for each processed article. Sentiment is also monitored for each category included in Newstin taxonomy over specified time intervals. The results of such monitoring include, for example, overviews of positive vs. negative news coverage for thousands of individual companies and persons.

Newstin Statistics

Newstin Statistics is a large database of diverse statistical data continuously produced by Newstin upon the basis of all collected content and extracted metadata. Newstin Statistics contains data from hundreds of thousands of categories in eleven languages retrieved from over 160 thousand sources. Newstin unconditionally meets all confidentiality requirements and does not collect any personal information related to Newstin users or statistical data about customers’ proprietary content. In particular, Newstin Statistics collects information about news coverage of companies, persons and topical categories and their mutual relations, and also includes information about whether coverage is positive or negative. This combined data enables both comprehensive assessment of past events and forecasting of future developments. Not only can Newstin Statistics provide answers to simple enquiries, such as news coverage of a specific area of interest, company or person, it is also an invaluable tool for complex analysis, e.g. ascertaining the efficiency of a PR campaign, or - based upon assessment of global press coverage – the extent and nature of relationships between certain companies. The continuously growing Newstin statistical collection forms the foundation of innovative new Newstin applications, while subsets from Newstin Statistics have been incorporated into existing applications. Newstin Statistics offers various angles to view Newstin data so that each user can find what is suitable for them. Newstin data are primarily about media coverage, and we offer many possible methods for extracting information of importance to business users. Given the fact that media articles consist of free sentences without any predefined structure, this is no easy task and such content is very hard to present in a compressed and quantified form, preferably with numbers and graphs.

Visualization

In this field Newstin has taken a proactive approach and provides products for data visualization including custom made interactive reports and charts. Newstin has thus taken the first steps to meet the needs of analysts who require information in an efficient and user friendly format to ensure effective utilization of Newstin’s vast capabilities of producing valuable metadata. Existing visualization examples include: NewstinMap and ConnectingVIP.com. NewstinMap is a Flash application that provides convenient browsing within a structure of multiple thousands of taxonomy nodes. ConnectingVIP.com is based upon a general multi-purpose Flash application that enables visualization of relationships based on automated analysis of context captured from news articles. As a proof of concept, ConnectingVIP displays how various people are associated on an interactive graphical map.

News Organizer

News Organizer

Multilingual news aggregator provided as a white label product powered by award-winning technology

If you have any questions or would like to trial any of our products

Email us now

Skype

+1 408 414 7397