To understand more about what a mashup is, you should look at the origins of the word: it comes from pop music, and a mashup is a mix of vocals and instrumental tracks from two different songs (usually belonging to different genres). New song. In the Mashup genre, we explore popular mashups,
1. Introduction
A new type of Web-based data integration application is gradually emerging on the Internet. Often referred to by the term mashups, their popularity sprouts from a Frankenstein-like emphasis on interactive user engagement and integrating third-party data. We use the word sprout for a reason; mashup Web sites are characterized by their emergence on the Web, leveraging content and functionality from data sources outside the boundaries of the organization.
The cryptic data integration definition of mashup is certainly not very strict. To understand more about what a mashup is, you should look at the origins of the word: it comes from pop music, and a mashup is a mix of vocals and instrumental tracks from two different songs (usually belonging to different genres). New song. Like those "bastard pop" songs, mashups are unusual and innovative combinations of content (often derived from unrelated sources) that are synthesized by humans (rather than by computers).
So, what does a mashup look like? The ChicagoCrime.org Web site has a very intuitive example that explains what a map mashup is. One of the first mashups to become widely popular was a Web site that combined criminal records from the Chicago Police Department's online database with maps from Google Maps. Users can interact with the mashup site, for example by telling it to display a graphical interface with a map containing pushpins showing details of all recent home invasions in Southern California. The concept and presentation are very simple, and the visualization capabilities provided by combining crime and map data are very powerful.
In the mashup genre, we explore popular mashups, including map mashups. A brief introduction to the technical landscape related to the construction and operation of mashups is provided. The technical challenges and social challenges sections introduce the main technical challenges and social challenges affecting mashups respectively.
2. Mashup types
In this section, we will briefly introduce some surveys of well-known mashup types.
Map Mashup
In this stage of information technology, people collect a large amount of data about things and behaviors, both of which often have location annotation information. All of these different datasets containing location data can be presented graphically in amazing ways using maps. One of the main driving forces behind the boom in mashups is Google's public release of its Google Maps API. This opens the door to Web developers (hobbyists, patch developers, and others) being able to include all types of data in maps (from atomic bomb disasters to Boston's CowParade cows). In order not to fall behind others, Microsoft (Virtual Earth), Yahoo (Yahoo Maps) and AOL (MapQuest) also quickly disclosed their own APIs.
Video and Image Mashups
The rise of image hosts and social networking sites (such as Flickr, which uses its own API for sharing images) has led to the emergence of many interesting mashups. Because content providers have metadata associated with the images they save (such as who took the photo, what the photo is about, when and where it was taken, etc.), mashup designers can combine these photos and others with the metadata Put relevant information together. For example, a mashup could analyze a song or poem to stitch together related photos, or display a social network graph based on the same photo metadata (title, timestamp, or other metadata). Another example may take a Web site (for example, a news site such as CNN) as input, and present the content in the photo in the form of text through photo matching in the news.
Search and Shopping Mashups
Search and Shopping Mashups have been around long before the term mashup was coined. Before the emergence of Web API, there were quite a few shopping tools, such as BizRate, PriceGrabber, MySimon and Google's Froogle, which all used B2B technology or screen scraping to accumulate relevant price data. To facilitate the development of mashups and other interesting Web applications, consumer sites such as eBay and Amazon have released their own APIs for programmatic access to their content.
News Mashup
news sources (such as the New York Times, BBC, or Reuters) have been using syndication technologies such as RSS and Atom since 2002 to publish news feeds on various topics. A mashup based on federation technology can aggregate a user's feed and render it over the Web to create a personalized newspaper tailored to a reader's unique interests. Diggdot.us is one such example, merging technology-related content from Digg.com, Slashdot.org and Del.icio.us.
3. Technical Challenges
Like other data integration fields, mashup development is also filled with many technical challenges that need to be solved. As the features and functions of mashup applications are further enriched, this challenge has become more severe. This section briefly introduces some of the challenges, some of which can now be solved or mitigated, while others remain unresolved.
Data Integration Challenges: Semantic and Data Quality
Quality surveys show that the primary concern of today's enterprise IT is data integration in enterprise virtual organizations. (In this case, we use the term virtual organization to refer to a combination of many federated business units, each contained in its own administrative domain.) With many organizations that find themselves busy integrating traditional data sources Like enterprise IT managers (for example, creating enterprise dashboards that reflect current business conditions), mashup developers face similar challenges that arise from sharing semantics between heterogeneous data sets. So, to understand how mashup developers are preparing for this, one need only understand the integration challenges facing enterprise IT.
For example, we have to design conversion systems between data models. When converting data into a common format, when the mapping is incomplete (for example, one data source may have a model in which one address type contains a country field, but another model does not have this field), we must do some Reasonable assumption. Although these challenges have been faced, mashup developers may not be experts in the source data model field, because these models may be third-party products, and these reasonable assumptions may not be intuitive and clear, which exacerbates the severity of the challenge.
In addition to missing data and incomplete mapping, mashup designers may find that the data they wish to integrate is not suitable for machine automation; this will involve a lot of sanitization work. For example, law enforcement arrest records may be inconsistent: the records may use common abbreviations for names (e.g., "mkt sqr" in one record and "Market Square" in another), making it unclear whether, for example, Automatic inference about same-sex behavior becomes very difficult, even with good heuristic rules. Semantic modeling technologies, such as RDF, can help simplify the problem of automatically reasoning between different data sets that are embedded in the data storage medium. For traditional data sources, a lot of manpower and material resources are usually invested in analysis and data purification before they can be used in semantic modeling technology.
Mashup developers may also have to face some issues that IT integration managers don't have to face, one of which is data contamination. As part of the application design, many mashups require input from public users. Research in the field of wiki applications shows that this is a double-edged sword: it can be very powerful because it allows for open contributions and best-in-class data innovation, but it can lead to inconsistent, incorrect, or misleading data items. . The latter may jeopardize the credibility of the data, ultimately reducing the value that the mashup brings.
Another integration problem that mashup developers need to face arises from the screen scraping techniques that must be used to obtain data. As discussed in the previous section, analysis and acquisition of tools and data models require a lot of work related to reverse engineering. In the best-case scenario, these tools and models can be created, but there is still the issue of how the source site renders its own content, which can break the integration process and cause errors in the mashup application.
Component Challenges
Although the Ajax model of Web development can provide a richer and more seamless user experience than traditional whole-page refresh techniques, it also poses some challenges. At its basic level, Ajax requires using the browser's client-side scripting capabilities with its own DOM to implement a content delivery method that was entirely envisioned by the browser's designers. (Perhaps the hacker-like nature of Ajax adds to its appeal.) However, this subjects Ajax-based applications to the same browser compatibility issues that have plagued Web developers since Microsoft developed Internet Explorer. For example, the Ajax engine uses an XMLHttpRequst object to exchange data asynchronously with the remote server. In Internet Explorer 6, this object is implemented using ActiveX rather than native JavaScript, which requires ActiveX to be enabled.
A more basic requirement is that Ajax requires JavaScript to be enabled on the user's browser. This may be a reasonable assumption for most people, but for some specific users, their browser or automated tool may not support JavaScript, or may not have JavaScript support enabled. Such tools include robots, spiders, and Web crawlers that collect information for Internet and intranet search engines. Without functionality concessions, Ajax-based mashup applications may also find themselves losing some user base and becoming less attractive to search engines.
Using JavaScript to asynchronously update content on a page also creates user interface issues. Because the content no longer needs to be linked to the URL in the browser's address bar, users may not experience the functionality of the browser's BACK button or bookmarks. In addition, although Ajax can reduce latency by requesting incremental content updates, poor design may have a negative impact on the user experience, for example, when the update granularity is very small, the number and load of updates occupy all available resources. In addition, we also need to care about how to support users when loading the interface or updating content (for example, using visual feedback technology such as progress bars).
As with any distributed cross-domain application, there are security concerns that mashup developers and content providers need to address. The concept of identity can be a thorny subject, and the traditional Web was primarily built for anonymous access. Single sign-on is a desirable feature, but there are multiple competing technologies (from Microsoft Passport to Liberty Alliance) that can lead to a mess of identity namespaces that we must integrate. Content providers may adopt authentication and authorization models in their own APIs (which require the concept of secure identities or secure confirmed attributes) to enforce business models involving paid subscriptions or sensitive data. Sensitive data may also require a certain level of confidentiality (i.e. encryption), and we must know when to integrate it with other resources without introducing risk. Identity is also important for auditing and regulatory compliance. In addition, since data integration occurs on both the server and client side, identity and certificate delegation from the user to the mashup service may also become a requirement.
4. Social Challenges
In addition to the technical challenges introduced in the previous section, with the further popularity of mashups, some social problems have also emerged (or are about to arise).
One of the most serious social issues that mashup developers need to face is striking a balance between the protection of intellectual property and consumer privacy versus publicity and the free flow of information. Unsuspecting content providers (the target of screen scraping), content providers who provide APIs to facilitate data retrieval, may need to determine whether their content is being used by others in a way that they have not approved. Mashup Web applications are still in their infancy, with some hobbyists writing mashups in their spare time. These developers may not be aware of (or care about) issues such as security. Additionally, content providers are only beginning to see the value of providing APIs for machine-based access to content, and many do not see this as a core business concern. This all combines to result in low-quality software today, as efforts such as testing and quality assurance are prioritized lower than proof-of-concept and innovation. To promote the maturity of the software development process, the community must work together as a whole to develop open standards and reusable toolkits.
Before mashups can move from a cool toy to a programmatic application, a lot of work needs to be done to formulate highly robust standards, protocols, models, and toolkits. To do this, major software development industry pioneers, content providers, and entrepreneurs must recognize the value of mashups as a viable business model. API providers need to determine whether to charge for their content and, if so, how to charge for it (e.g., by subscription or per use). Perhaps they will provide different levels of service quality. Some marketplace providers, such as eBay or Amazon, may find that free APIs will increase product turnover. Mashup developers may want to pursue an advertising-based revenue model, or build interesting mashup applications to gain recognition.
Conclusion
Mashups are indeed a fairly new Web application. The combination of data modeling techniques derived from the Semantic Web and loosely coupled, service-oriented, platform-independent communication protocols will ultimately provide the infrastructure necessary to develop applications that can fully exploit and integrate large amounts of Web information. As mashup applications gain more and more attention, it is important to understand how they will contribute to certain social issues (such as the issue between public use and intellectual property protection) and other application areas (integrating data across organizational boundaries, such as the Internet). It will be interesting to see how this impacts grid computing and B2B workflow management.