This article is from WeChat public account: InfoQ(infoqchina) , author: Xue Liang, from the title figure: vision China

There are 1.5 billion monthly users in the world, and the technical support behind them is inseparable from the solid infrastructure and technical people who understand the technology.

The current leader of the bytebeat infrastructure team is Liang Yuming, Liang Yuming, who joined Hulu after graduating from Tsinghua University in 2010 and served as the head of Hulu China’s data and advertising team. In 2018, Liang Yuming left Hulu and joined the bytebeat to become the head of the bytebeat infrastructure team.

ArchiSummit Global Architects Summit (Beijing Station), December 6-7, 2019, with extensive experience in technical architecture practice management Liang Yuming will serve as the conference Co-chair. Taking this opportunity, we interviewed Liang Yuming. In the interview, Liang Yuming shared his experiences in working methods, thinking skills, technical insights, and technical management over the years. I hope to inspire everyone.

InfoQ: You have been working at Hulu for many years. What aspects of Hulu’s gains are helpful for your current job?

First, Hulu’s experience helped me to look around in a dynamic way. I joined Hulu in 2018 when Hulu was just established in 2007. After Hulu’s transition from the initial stage to the high-speed growth period, it gradually entered a mature life cycle and participated in a relatively complete enterprise life cycle. At different stages, the company exhibitsThe emerging business characteristics, talent structure, organizational processes, and technical systems are very different. This experience allowed me to examine the surrounding world from a dynamic perspective, to have a better judgment of the stage in which the company is located, and to make technical decisions that are more adaptive to the development stage.

Secondly, because Hulu was added in the initial stage, the technology stack involved is quite extensive. From search recommendation to mobile development to big data system, it is involved in different technical systems and business systems. Knowing that when doing infrastructure, you can better understand business needs and achieve deeper cooperation and win-win with the business.

InfoQ: What do you think are the similarities and differences between Hulu and Byte Beat?

From a cultural point of view, there are many similarities and differences between the two. The similarity is that Hulu’s basic cultural feature is to emphasize collaboration & consensus rather than command & control, which is very consistent with the always-starting, honest, open and humble of bytes. Yiming mentioned an idea called “Context. , not Control”, both companies have a strong bottom-up to promote innovation. The difference is that the bytes are more dare and breakthrough, and more emphasis is placed on the absence of boundaries, while Hulu is more stable, and this business feature and user expectations are very relevant.

In terms of organizational management, there is a big difference between the two. Hulu is more like a classic tree structure. Some important consensus needs to rise to the common parent to reach a consensus, and the bytes are flatter. The mesh link is built, which can quickly reach the consensus of multiple teams without going up to the leader level. In Hulu’s important role, you need to be very familiar with Hulu’s processes and mechanisms, and seek your own ideas in a certain process. In the case of bytes, you need to establish a wider connection, and reach a consensus through the connection. The former is more predictable. The latter is more predictable. More efficient, adapt to the development stage of their respective companies.

From the development stage, the two are also very different. Hulu is relatively mature, the business model tends to be fixed, the business model is relatively stable, the technical system is relatively stable, the organizational structure and organizational processes are relatively perfect, and the emphasis is on the integrity. Bytes are still in a period of rapid development. The business model and business model are still in the stage of rapid evolution. They emphasize that they can break through and dare to take responsibility, emphasize the possibility of trying multiple possibilities, and continue to optimize in organizational structure and organizational processes.

InfoQ: What do you think of the byte industry?What are the characteristics of these features that pose challenges to the infrastructure?

The first byte of the business model is very diverse, and different businesses present very different characteristics. The traditional information flow business data and services are very large, and have very high requirements for system scalability and performance. The data models of new flying books, education and other services are very complex, system availability, and data consistency requirements are very high. high. This completely different business model makes it difficult for the infrastructure to be fully supported through a streamlined set of systems, requiring a more complex product matrix.

The scale of the second byte is very large. In China, we have hundreds of thousands of top-fit ​​physical servers, the number of EB storage, tens of thousands of micro-services, and traditional information flow business models, This makes the byte business architecture system very complicated. How to improve the system’s considerable controllability under the premise of system stability, while reducing system cost is a big challenge.

In the final stage of development, bytes are still in the stage of rapid development, and various types of exploration are being done. In general, we expect to minimize the constraints on the business (although some constraints are more reasonable), providing a richer set of features/features to make the business more iterative, and this light-constrained, heavy-featured expectation is especially difficult for infrastructure challenges .

InfoQ: What are the ideas for the byte-hopping infrastructure to address these challenges?

First of all, we need to clarify our optimization goals and support the rapid iteration of the business: for the current business development phase of bytes, in order to better support the rapid iteration of the business, these things need to be done:

1. The architecture needs to provide a rich product matrix, provide the most familiar interface for R&D, and enable business development to get the fastest and fastest solution to business problems, rather than providing a very streamlined system that allows the business to spend time. To adapt;

2. Under the rich product features, minimize user constraints. For example, we do not impose too much constraints on the data size and data structure when using Redis for business, and try to sink the optimization to the infrastructure side. Responding to the data size of Redis broke the limit of the Redis Cluster scale, IThey have researched the Redis clustering solution;

3. Try to sink some common capabilities to the infrastructure side, such as traffic scheduling, unit building, and system disaster recovery.

The architecture continues to be upgraded: the rapid development of byte hopping makes the early architecture inevitably appear to be “rough and fierce”. The upgrading of the architecture is very important. In order to develop the bytes longer, it is necessary to make a decision. High-speed aircraft change engines instead of sewing on existing systems, so upgrading the architecture and supporting short-term business growth are just as important goals.

Secondly aligned with our technical philosophy, bytebeats opened the evolution of Infrastructure 2.0 in 2018. The basic characteristics are from “following the business” to “being business-oriented and higher than the business”. The direction of business”.

The infrastructure must be “sourced from the business”, and the infrastructure that is completely out of the business is a castle in the air, and it is easy to fall into self-confidence. A really good architecture must come from the reality of the business, which also means that the infrastructure needs to be in deep contact with the business;

“Beyond the business” means that the basic metrics of the system provided by the architecture are higher than the business requirements, and the abstraction is higher than the business requirements. This is very important for companies during high-speed development because the business development is very changing. Fast, if you do “just” to meet your business needs, you will soon find that after a few months, it becomes “just right” to meet the needs of users. From an infrastructure side perspective, we should also give examples of best practices, which means that the infrastructure is not only focused on Infrastructure, but also needs to provide Architecture support;

“First with business” means that the architecture needs to prejudge the direction of business development and prepare in advance, because many systems have a long implementation time, and a certain amount of advancement will be very helpful for the business system. It also means that the infrastructure will do a lot of investment for the future.

InfoQ: Under such a guiding line, how is the byte infrastructure done?

First, from the organizational structure, bytes merge the online infrastructure with the offline infrastructure into a single team. The integrated infrastructure provides the three infrastructures of offline, online storage, computing, and R&D systems, and serves as a common base for all byte product lines such as today’s headlines, vibrato, and flying books. Organizational structure adjustment and optimization goals are inseparable, such as online programming of bytesDegree team (based on Kubernetes) and offline (based on YARN) Integration into a team has laid a solid foundation for the rapid advancement of offline online hybrid deployments and the evolution of technology systems.

Secondly, from the technical system, the byte architecture is mainly composed of three parts, storage + calculation + R&D system:

In the storage level (Note: The storage here includes classic storage systems such as HDFS, objects, etc., including NoSQL, NewSQL, graph database, etc.), the byte strategy is to evolve to the tiered storage architecture, build a storage product matrix compatible with open source systems on a unified pooled storage to meet business needs. The hierarchical structure based on pooled storage helps us to unify the underlying implementation, implement distributed common problems and important optimizations once, benefit all upper-layer systems, simplify the development of upper-layer systems, and avoid the operational complexity brought by rich storage product matrices. The storage product matrix compatible with open source systems is mainly used to simplify business access and improve development efficiency;

From the computational level, we are promoting the integration of online computing systems and offline computing. In the long run, we will integrate the Kubernetes and YARN underlying layers into a resource scheduling system, which will unify PaaS, FaaS, batch computing, and streaming computing. The scheduling and scheduling system is running, and the resource utilization rate is continuously improved. Bytes are also exploring the further integration of virtualization and containerization;

From the R&D system side (Bytes will be RPC framework, PaaS, intelligent operation and maintenance, governance system, stability system, performance system collectively called R&D system), most of the bytes of business are built on PaaS rather than IaaS. PaaS makes the business focus on resources rather than machines. It is very important for the unified deployment of resource flows and the unification of environmental management systems. .

From the cooperation process again, the infrastructure has relatively complete long-term planning, medium-term goals, short-term execution management mechanism, and at the same time maximizes the information of the architecture to the business side, because a business changes rapidly and the team scale grows rapidly. Team, enhance informationSynchronization and reduction of information asymmetry are very important for enhancing mutual trust and promoting cooperation.

InfoQ: You mentioned that the byte infrastructure needs to meet the needs of the business, but also need to upgrade the architecture. There are certain contradictions between the two, how do you grasp this balance point? ?

Managing business expectations is very critical. Sometimes the business is not really so anxious at this moment, but can’t see the hope of improvement, so I am in a hurry. It is of great significance to share the long-term and short-term plans to the business and manage the expectations of everyone.

Requirements converge than imagined: Infrastructure often receives various types of requirements, but if you can abstract to a higher level, many requirements can be categorized. This requires that the students of the infrastructure are not blindly executed, need more thinking and sublimation, avoiding not thinking about blindly doing, leaving various problems, making the iteration slower and slower.

Priority can be discussed. Most of the business needs have real urgency and claims that are quite different, deep business, understanding real needs and priorities, helping to grasp the main contradictions and solve the core problems.

Be a good infrastructure for financing. The problem that infrastructure often faces is that the business feels that the architecture is not fast enough, and that it is very anxious, and the business itself is open. Often the business side pays more attention to short-term practicality, and there is a gap in the general type of infrastructure that is made. If these systems are handed over to infrastructure maintenance in the future, the architecture will often take more effort to rebuild, and many will become historical debts; if not for infrastructure maintenance, the company-level infrastructure will gradually split. To solve this problem, it is important to have an infrastructure to be good at and a business relationship. It is important to “raise the banner and guide everyone to follow”. It is necessary to reach a consensus on the direction and the program level and the business, guide the key directions, introduce more business resources to jointly develop, and use the power of the business side to jointly evolve.

Technical iterations often require the support of organizational structure. If conditions permit, it is very helpful to do the splitting of the team.

InfoQ: In the process of infrastructure evolution, have you encountered any technical problems? How did you solve it?

There must be a lot of thorny issues in the evolution process. I am here to tell you that every team that grows fast will meet, but not all teams can quickly come out, that is, the operation of the black hole. The so-called operation and maintenance of the black hole, is the main energy of the team is invested inIn the operation and maintenance, the system itself has no empty iterations, and the business scale continues to expand, and the operation and maintenance are getting more and more difficult, and finally it is deeply immersed in it. How to prevent operation and maintenance of black holes, I believe that everyone has ideas, such as to promote the evolution of operation and maintenance from humanization to scripting, to platform, and finally to intelligent evolution, but in fact, many directions will still fall into the operation and maintenance black hole, This has a close relationship with the technical system and the stage of business development.

A storage system that I took over a few months ago was because the early operation and maintenance investment was not enough, the stability investment was not enough, and the business use produced explosive growth, which caused almost all the research and development to fall into the operation and maintenance. . In order to solve this problem, we have made several efforts:

Since it is a black hole, it is very difficult to escape through its own strength. Therefore, it is very necessary to quickly deploy and expand the team. At this time, you can stop other low-priority things of the team and quickly call the manpower. Expand the most earthly method of personnel, first reduce the per capita operation and maintenance load, so that there is time to optimize;

It’s not enough to just deploy people. It’s also necessary to add different types of characters. For example, the fact that you are stuck in the operation and maintenance black hole may reflect the potential shortage of the team’s existing talents in building a highly maintainable system, complementing different roles. It is necessary to supplement the entry of people with different characteristics;

On the basis of others, the most important thing to get out of the operation and black hole is to accurately identify the core problem, and avoid the focus of Dongyitou and Xiyi. Clearly sort out the core operation and maintenance pain points, develop a proprietary plan, and grasp the key points is the key;

The system that is often caught in the operation and maintenance of black holes is not only the lack of operation and maintenance tools, but also closely related to system design and system deployment. Sometimes, large changes are required before they can come out. However, the change is accompanied by risk. As a leader, it is necessary to endorse, do the most comprehensive preparation, and make the worst plan. It should not be overreacted to the possible accidents during the operation and maintenance accident, or it will seriously affect the team’s decision-making;

At the end of the team, often in the operation of the black hole, team morale will be a big problem, supplementing the manpower, clear planning, giving everyone a clear expectation is very critical, regular progress is synchronized, positive for positive change, More will help the team rebuild confidence.

Operation and maintenance black hole is what everyone wants to avoid, but it is not a minority. It is not only the importance that needs to be taken out, but also the resource inclination of the leader, endorsement of questions, and spiritual support.

InfoQ: You emphasize that the development of the infrastructure needs to be ahead of the business. You also explained that there is a strong culture of innovation in the byte. How do you encourage innovation in your team?

The goal of the team is to be consistent with the company. The emphasis on innovation in the byte culture is the basis for each team to build a good innovation atmosphere. For the architecture team, I believe that innovation requires a system to guarantee, and innovation without system guarantee is often not reproducable. The first thing to innovate requires that you not only pay attention to something in front of you, but also need to gain a holistic view. Every two weeks, the byte infrastructure will synchronize the large-scale information within the architecture to help you understand the important progress in all directions of the architecture, so as to gain a global perspective. The two-month summary will continue to strengthen this process; secondly, innovation needs to be planned. In combination with the byte project cycle, the architecture will update the long-term plan of a version every two months. The long-term planning is discussed from the bottom up, reflecting the common thinking for the long-term, because the iteration is repeated once every two months. Not too big. Re-innovation requires organizational assurance. We have built an application innovation center team that works with other teams to promote the development and implementation of innovative new projects.

The last innovation is the need for incentives. In the annual performance summary, we require all team members to write a project that is initiated by themselves rather than top-down. It also requires all leaders to write a project by team. Member proposes and projects in their support of the whereabouts. These mechanisms are combined to advance the innovation of the architecture.

InfoQ: As the person in charge of the infrastructure, you not only need to pay attention to technology, but also need to manage the technical team. What experience do you have in the direction of team management, and what skills do you focus on now?

In my opinion, as a technical manager, you need at least three capabilities: Leadership Skills + Management Skills + Technical Skills

Leadership Skills is used to establish team orientation, nurturing teams, building organizational culture, and ensuring that we have the right people in the right direction;

Technical Skills is used to complete Solution Design, Problem Solving, and Engineering Excellence to ensure that our technology selection supports us.Vision;

Management Skills is used to establish work patterns, build organizational project processes, and promote external collaboration consensus to ensure that we are moving forward in a win-win, risk-controlled manner.

In different stages of company development, managers will have different concerns and different levels of focus at different levels of management. Generally, the first-line leader is more technical Skills, and the second-line leader puts more emphasis on Leadership skills; the start-up leader emphasizes Technical Skills, and the leader of the rapid growth period needs better Management Skills. As the head of Byte, the infrastructure that quickly becomes a large company, I need more time to establish the team direction, build the team, set up the project process, and promote cooperation, but to lead the infrastructure to the next stage, the technology is I am very important, so I am constantly learning in these areas.


This article is from WeChat public account: InfoQ (infoqchina) , author: Xue Liang