At the fundamental level, a container is a collection of physical resources such as RAM, CPU cores, and disks on a single node. There can be multiple containers on a single node (or a single large one). Every node in the system is considered to be composed of multiple containers of minimum size of memory (e.g., 512 MB or 1 GB) and CPU. The ApplicationMaster can request any container so as to occupy a multiple of the minimum size.
A container thus represents a resource (memory, CPU) on a single node in a given cluster. A container is supervised by the NodeManager and scheduled by the ResourceManager.
Each application starts out as an ApplicationMaster, which is itself a container (often referred to as container 0). Once started, the ApplicationMaster must negotiate with the ResourceManager for more containers. Container requests (and releases) can take place in a dynamic fashion at run time. For instance, a MapReduce job may request a certain amount of mapper containers; as they finish their tasks, it may release them and request more reducer containers to be started.
Handling of container failures is the responsibility of the applications/frameworks. YARN is responsible only for providing information to the applications/framework. The ResourceManager collects information about all the finished containers as part of the allocate API’s response, and it returns this information to the corresponding ApplicationMaster. It is up to the ApplicationMaster to look at information such as the container status, exit code, and diagnostics information and act on it appropriately. For example, when the MapReduce ApplicationMaster learns about container failures, it retries map or reduce tasks by requesting new containers from the ResourceManager until a configured number of attempts fail for a single task.
Once a container starts, for it to be able to perform its duties, it may depend on the availability of various pieces of information. Some of this information may be static, and some may be dynamic—that is, resolvable only at run time. Static information may include libraries, input and output paths, and specifications of external systems like database or file system URLs, as
- The ApplicationMaster should describe all libraries and other dependencies needed by a container for its start-up as part of its ContainerLaunchContext. That way, at the time of the container launch, such dependencies will already be downloaded by the localization in the NodeManager and be ready for linking directly.
- Input/output paths and file-system URLs are a part of the configuration that is beyond the control of YARN. Applications are required to propagate this information themselves.
- Local directories where containers can write some outputs are determined by the environment variable ApplicationConstants.Environment.LOCAL_DIRS .
- Containers that need to log output or error statements to files need to make use of the log directory functionality. The NodeManager decide the location of log directories at run time. Because of this, a container’s command line or its environment variables should point to the log directory by using a specialized marker defined by ApplicationConstants.LOG_DIR_EXPANSION_VAR (i.e., <LOG_DIR>). This marker will be automatically replaced with the correct log directory on the local file system when a container is launched.
- The user name, home directory, container ID, and some other environment specific information are exposed as environment variables by the NodeManager;