Large-scale Web Data and Performance Bottlenecks from Web 1.0 to 5.0
The main sources that generate the large-scale web data
In this section, we will identify the main sources that contribute to the large-scale web data in each generation of the Web.
Web 1.0
Web 1.0 which is the first stage of the current world wide web was based on hyperlinked web pages (Sahu, Mohapatra & Balabantaray 2016). Nath, Dhar, and Basishtha (2014) state HTML, HTTP, URI these are core technologies used in Web1.0 and responsible for contributing to the large-scale web data. In web1.0 both server side such as ASP, PHP, JSP, CGI, and PERL; and client-side scripting JavaScript, VBScript, and flashare used as client-side scripting, which somehow has their fair share of contribution to the large-scale web data (Basishtha, 2014).
Web 2.0
Ohara, Nagpurkar, Ueda and Ishizaki (2009) described the features of Web 2.0 technologies as it nature to enable easy collaboration and sharing by allowing users to contribute, modify, and aggregate content using applications like Wikis, Blogs, Social Networking communities, and Mashups. Web 2.0 applications also make heavy use of Ajax, which allows asynchronous communication between client and server, to provide a richer user experience. Nath, Dhar and Basishtha (2014) depicted Web2.0 as a version of the web that change the way of interaction among peoples. Some of these Web 2.0 features that was mentioned by (Nath, Dhar & Basishtha, 2014), that we found contributing to a large-scale web data includes: 1) Individual production and user-generated content which is something that revolutionize the web that the producer-consumer relationship between the website owner and readers is changed that readers could be able to contribute their content for example by commenting on online technologies such as wikis and blogs. 2) Harness the power of the crowd; Web 2.0 started the idea of re-use of the collective information or contribution provided by the participants and crowdsourcing. 3) Data on an epic scale, data such as user-contributed data, which can be collected indirectly and aggregated in new ways. 4) The architecture of participation, which is an architecture that refers a way to design an online technology such that it facilitates the participants and helpful for collaborative knowledge construction. 5) Network effects, the increase in the usefulness of a system when more and more user join the system. 6) Openness, which is an idea is mainly concern with open access, open software and the use and re-use of free data.
Web 3.0
Web 3.0 is to define structure data and link them to create a more effective discovery, automation, integration, and reuse across various applications. Web 3.0 is also known as a semantic web which is a web that can demonstrate things in the approach which computer can understand. The main important purpose of the semantic web is to make the web readable by machines and not only by humans (Aghaei, Nematbakhsh & Farsani, 2012). Web 3.0 is the era of the web where machines started understanding human input to make this happen there has been some graph like languages that was developed to create nodes and relationships to represent the real data which contributed the majority of large-scale web data of its era.
Web 4.0
Sahu, Mohapatra and Balabantaray (2016) described Web 4.0 as a symbiotic web that was aimed to incorporate interaction of human and machines. In simple words, machines would be clever on reading the contents of the web and react in the form of executing and deciding what to execute first to load the websites fast with superior quality and performance and build more commanding interfaces (Aghaei, Nematbakhsh & Farsani, 2012). The major data source of the web era is the addition of multiple IoT devices such as smart house devices, fitness watches, health monitor chips, self-driving cars, and the log and user data generated from this devices contributed mainly to the large-scale web data.
Web 5.0
Web 5.0 also known as “Symbionet Web” is designed to be very much decentralized in which devices or machines will be able to explore other interconnected devices and create the model of the Web (Alam, Cartledge & Nelson, 2014). Alam, Cartledge and Nelson (2014) claim the current web is emotionless, but Web5.0 is designed to incorporate emotion. Web 5.0 takes into account the feelings of the user. It is guided by technologies that already exist to measure feelings and their effects (Algosaibi, Albahli & Melton, 2015). All the metadata that needs to capture and support this AI capability is the major contributor to the large-scale web data.
Typical places where performance bottlenecks of accessing a large-scale of data
In this section, we will be discussing the area where a major bottleneck of accessing the large-scale of data for each generation of the Web does occur.
Web 1.0
As Web 1.0 was the very first development of WWW, the major bottleneck faced by it is that pages designed using this were understandable to humans only (Sahu, Mohapatra & Balabantaray 2016).The other issue related to web1.0 was its slow nature and whenever new information entered to the web pages, it needs to be refresh every time. Its failure to address two-way communication lead Web1.0 to be replaced by its recent more robust versions of the web (Nath, Dhar & Basishtha, 2014).
Web 2.0
Ohara, Nagpurkar, Ueda and Ishizaki (2009) described Web 2.0, often retrieve and update persistent data. This can lead to frequent database accesses, lock contention, and reduced performance. We also show that problems in the persistence layer, arising from the data-intensive nature of Web 2.0 applications, can lead to poor scalability that can inhibit us from exploiting current and future multicore architectures. Potential I/O bottlenecks on the database server are eliminated by putting the database on a ramdisk and high-performance multi-core processors to increase multi-threading.
Web 3.0
Algosaibi, Albahli and Melton (2015) state data on Web 1.0 and Web 2.0 are about connecting information, but in Web 3.0 it is about connecting knowledge and semantically structuring documents. The main idea of Web 3.0 or the Semantic Web is to shift the thinking of published data in the form of web pages in the form of HTML documents to allow machines to understand the contents. To achieve this goal, (Algosaibi, Albahli & Melton, 2015) state a new approaches, languages, technologies and data representation models had to be built. A variety of semantic languages and standards are maturing, and different applications, tools, and services where built. Albahli and Melton (2014) describe the invention machine-processable of graph data models such as Resource Descriptive Language (RDF) or Web Ontology Language (OWL). RDF acts as a data model used to manage, structure and reason about the data found on the Web, and to show how the data relate to reality. Locating and extracting these RDFs is the major performance bottleneck of Web 3.0 (Cure, Naacke, Randriamalala & Amann, 2015).
Web 4.0
The critical development of Web 4.0 is the movement of the web functionality to the physical world or the notion of fog computing to coexist with the cloud computing. This enabled users to create and control their data. The ubiquity of mobile devices and increase in their performance and the increased capabilities in the wireless connections and telecommunication boosted Web 4.0 data generation and personalization effort by Web4.0 (Ferrer-Roca, Tous & Milito, 2014; Nath, Dhar & Basishtha, 2014). This is what we see the main performance bottleneck of this era users has not only become the users, but individual owners of their content and the effort of the connected devices and data generated from these IoT devices is the main bottleneck of the web performance.
Web 5.0
Khanzode and Sarode (2016) state Web 5.0 can be considered as the decentralized symbiotic web. Web 5.0 attempts to creating personal servers for any personal data or information stored on the net via smart communicator such as using smartphones, tablets or personal robots that will be able to surf alone in the 3D virtual world of the symbiotic. According to the authors, the memory and calculation power of each interconnected smart communicator to calculate the billions and billions needed data to build the 3D world, and to feed its artificial intelligence takes the lion share of contributing to the performance bottleneck for Web 5.0.
The root causes that generate these performance bottlenecks
Arlitt, Cherkasova, Dilley, Friedrich & Jin (2000) discussed the causes of performance problems for Web 1.0 as lack of good systems design network elements and servers would become bottlenecks. Since the majority of Web objects are static, caching them at HTTP proxies can reduce both network traffic and response time. Dale Dougherty in the year 2004 defined Web 2.0 as a read-write web, facilitating more flexible web design, creative reuse, updates, modifications, collaborative content creation and collective intelligence gathering. Web 2.0 focused on human-generated or related data (Aghaei et al. 2012, 2-3). The performance bottleneck of this web era mainly came from this huge content added by users of Web 2.0. Web 3.0 was introduced with the new idea of linked data, implying that data can be linked, integrated and analyzed from various datasets which is what (Aghaei et al. 2012) referred to as semantic web data that is not only readable and understandable by humans but also machines. The main performance bottleneck of this era is mainly from the graph-oriented machine readable protocols as RDF. In the 2010s is the beginning of symbiotic web in which interaction between humans and machines began a new web era of Web 4.0 (Aghaei et al. 2012, 8). Web 4.0 consists of intelligent analytics and cyber-physical systems that use sensors, signals and historical data for further data mining (Lee, Kao & Yang, 2014). The major source of the bottleneck of this era is the addition of multiple gadgets and devices starting to be connected to the web, multiple mobile apps and IoT protocol associated to make them work together and a big data log generated contributed to the big performance bottleneck. Web 5.0 is the era of the web where machines start to understand what people want, and the emotional interaction between humans and computers increased based on neurotechnology (Benito-Osorio, Peris-Ortiz, Armengot & Colino, 2013). The huge big data analytics and algorithms involved to understand people’s behavior is the main bottleneck for this era as it is in its developmental stage.
High-level strategies to remove these performance bottlenecks
Paajanen (2017) claims web data generation has been increasing and it is expected to increase by four times by the year 2020. That is the fact the performance bottleneck problem needs to be addressed so that we could get the most out of the ever-increasing web big data. At each stage of the Web, different authors have suggested different solutions some that look more like outdated solutions because technology and the internet have come so far. For example, (Arlitt, Cherkasova, Dilley, Friedrich & Jin, 2000) suggested increasing access bandwidths and proxy caching important for reducing user latency as the ISP’s connection to the Internet to solve the main bottleneck in the system. This solution of the ear of dial-up is antiquated by the present-day broadband fibrotic networks.
There are a number of solutions to improve the performance of web servers such as including load balancers to accelerate the users’ responses, using NoSQL databases in place of relational databases and improving the hashing algorithm operating in this distributed NoSQL memory database, and session managers synchronizing the sessions of different web applications (Ji, Ganchev, O’Droma, Zhao & Zhang, 2014). The other strategy proposed to curb the performance bottleneck by (Cure, Naacke, Randriamalala & Amann, 2015) includes improving the encoding strategy of the web, materialization solutions for the web and improving the querying performance to locate content.
References
Aghaei, S., Nematbakhsh, M. A., & Farsani, H. K. (2012). Evolution of the world wide web: From WEB 1.0 TO WEB 4.0. International Journal of Web & Semantic Technology, 3(1), 1.
Alam, S., Cartledge, C. L., & Nelson, M. L. (2014). Support for various HTTP methods on the web. arXiv preprint arXiv:1405.2330.
Albahli, S., & Melton, A. (2014, December). ohStore: Ontology hierarchy solution to improve RDF data management. In Internet Technology and Secured Transactions (ICITST), 2014 9th International Conference for (pp. 340-348). IEEE.
Algosaibi, A. A., Albahli, S., & Melton, A. (2015). World Wide Web: A Survey of its Development and Possible Future Trends. In The 16th International Conference on Internet Computing and Big Data-ICOMP’15.
Arlitt, M., Cherkasova, L., Dilley, J., Friedrich, R., & Jin, T. (2000). Evaluating content management techniques for web proxy caches. ACM SIGMETRICS Performance Evaluation Review, 27(4), 3-11.
Benito-Osorio, D., Peris-Ortiz, M., Armengot, C. R. & Colino, A. 2013. Web 5.0: the future of emotional competences in higher education. Global Business Perspectives, vol. 1, iss. 3.
Cure, O., Naacke, H., Randriamalala, T., & Amann, B. (2015, October). LiteMat: a scalable, cost-efficient inference encoding scheme for large RDF graphs. In Big Data (Big Data), 2015 IEEE International Conference on (pp. 1823-1830). IEEE.
Ferrer-Roca, O., Tous, R., & Milito, R. (2014, October). Big and Small Data: The Fog. In Identification, Information and Knowledge in the Internet of Things (IIKI), 2014 International Conference on (pp. 260-261). IEEE.
Khanzode, K. C. A., & Sarode, R. D. (2016). Evolution of the world wide web: from Web 1.0 to 6.0. International journal of Digital Library services, 6(2).
Ji, Z., Ganchev, I., O’Droma, M., Zhao, L., & Zhang, X. (2014). A cloud-based car parking middleware for IoT-based smart cities: design and implementation. Sensors, 14(12), 22372-22393.
Lee, J. Kao, H. A. & Yang, S. 2014. Service innovation and smart analytics for Industry 4.0 and big data environment. Procedia CIRP.
Nath, K., Dhar, S., & Basishtha, S. (2014). Web 1.0 to Web 3.0-Evolution of the Web and its various challenges. In Optimization, Reliabilty, and Information Technology (ICROIT), 2014 International Conference on (pp. 86-89). IEEE.
Ohara, M., Nagpurkar, P., Ueda, Y., & Ishizaki, K. (2009). The data-centricity of web 2.0 workloads and its impact on server performance. In Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on (pp. 133-142). IEEE.
Paajanen, S. (2017). Opportunities of big data analytics in supply market intelligence to reinforce supply management.
Sahu, S. K., Mohapatra, D. P., & Balabantaray, R. C. (2016). Information retrieval in the context of checking semantic similarity in web: Vision of future web. Indian Journal of Science and Technology, 9(32).