Neural Networks Lunar Lander

Lunar Lander

This project implements a Lunar Lander game using Python and Pygame. It features manual keyboard controls and an optional autoplay mode driven by a neural network trained with a genetic algorithm. The simulation incorporates basic physics, including gravity and thrust mechanics, providing a challenging landing experience. The neural network interacts with a C++ backend via pybind11 for performance.

The project is structured into several Python modules handling configuration, game logic, visuals, and neural network training, along with a C++ interface file.

The game runner

The script main.py serves as the entry point for the application. It handles command-line arguments to select the run mode (play, nn_train, nn_play), initializes Pygame, sets up the display, sound (optional), game logic, and visual components.

The core game_loop function manages the main game cycle, handling events (keyboard input, quit), determining actions (either from the player or the NN), updating the game state via GameLogic, and rendering the scene using draw_game.


# Simplified game loop structure from main.py
def game_loop(mode: str):
    # ... Initialization (Pygame, screen, clock, font, sounds, logic, visuals) ...
    # ... NN Initialization if mode == 'nn_play' ...

    running = True
    game_over = False
    has_started = False # Track if player/NN initiated movement

    while running:
        clock.tick(cfg.fps)
        action = 0 # Default: Noop

        # --- Event Handling ---
        for event in pygame.event.get():
            # ... Handle QUIT, Keydown (Q for quit) ...
            if mode == 'play' and event.type == pygame.KEYDOWN:
                 if event.key in [pygame.K_UP, pygame.K_LEFT, pygame.K_RIGHT]:
                     has_started = True # Start game on first player input

        # --- Action Determination ---
        if not game_over:
            if mode == 'play':
                keys = pygame.key.get_pressed()
                if keys[pygame.K_UP]: action = 1
                elif keys[pygame.K_LEFT]: action = 2
                elif keys[pygame.K_RIGHT]: action = 3
            elif mode == 'nn_play':
                current_state = logic.get_state()
                nn_action = NN.get_action(current_state)
                # ... Logic to handle starting the game automatically ...
                if has_started:
                    action = nn_action # Use NN action if game started

        # --- Game Logic Update ---
        if has_started and not game_over:
            state, done = logic.update(action) # Update physics and check state
            if done:
                game_over = True
                # ... Print landing/crash message ...

        # --- Rendering ---
        draw_game(screen, logic, visuals, sounds, font)

        # --- Save Frame if enabled ---
        # ... pygame.image.save(...) ...

        # --- Game Over Handling ---
        if game_over:
            pygame.time.delay(1500)
            running = False

    pygame.quit()

# ... main function with argparse ...

The draw_game function takes the current game state and renders the terrain, landing pads, lander, thrust flames, and HUD elements (fuel, position, velocity).


# Snippet from draw_game in main.py
def draw_game(screen: pygame.Surface, logic: GameLogic, visuals: LanderVisuals,
              sounds: SimpleNamespace, font: pygame.font.Font):
    render_info = logic.get_render_info()
    # ... Extract x, y, fuel, vx, vy, last_action ...

    screen.fill(c.k) # Clear screen

    # Draw terrain & pads
    # ... pygame.draw.rect(...) ...

    # Draw Lander
    scaled_images = visuals.get_scaled_images()
    screen.blit(scaled_images["lander"], (lander_x, lander_y))

    # Draw Flames based on last action and fuel
    # ... screen.blit(scaled_images["vflames"], ...) ...
    # ... screen.blit(scaled_images["rflames"], ...) ...
    # ... screen.blit(scaled_images["lflames"], ...) ...

    # Play sound if thrusting
    # ... sounds.engine_s.play() ...

    # Draw HUD
    # ... font.render(...) ...
    # ... screen.blit(...) ...

    pygame.display.flip()

Configuration hub

All configuration are centralizes in a module. It defines: - lander_cfg: Lander properties (dimensions, flame offsets, fuel, image name). - cfg: General settings (window size, FPS, sound enabled, image saving). - game_cfg: Game physics and layout (initial state, pad positions/sizes, landing constraints, max steps). Includes logic for randomizing pad positions (reset_pad_positions, _generate_random_pad_positions). - planet_cfg: Environmental physics (gravity, friction). - nn_config: Neural network and genetic algorithm parameters (name, layers, population, GA settings, saving options, training parameters).


# Example: Game Configuration from mod_config.py
game_cfg = SimpleNamespace(
    random_position=True,       # set randomly the pad
    x0=np.array([52.0, cfg.height - lander_cfg.height - 52]), # initial position
    v0=np.array([0.0, 0.0]),    # initial velocity
    a0=np.array([0.0, 0.0]),    # initial acceleration
    spad_x1=50,                 # takeoff pad left boundary
    spad_width=80,              # takeoff pad width
    lpad_x1=cfg.width - 400,    # landing pad left boundary
    lpad_width=200,             # landing pad width
    pad_y1=cfg.height - 50,     # landing/ takeoff pad top boundary
    pad_height=10,              # landing/ takeoff pad height
    terrain_y=50,               # terrain height
    max_vx=0.5,                 # max horizontal speed for landing
    max_vy=2.0,                 # max vertical speed for landing
    max_steps=1000,             # max steps per episode
    current_seed=None           # Seed for pad positions
)

# Example: NN Configuration from mod_config.py
nn_config = SimpleNamespace(
    name="lunar_lander",    # nn name
    hlayers=[8, 64, 16],    # hidden layer structure (Input: 6, Output: 4)
    seed=5247,              # seed
    top_individuals=10,     # GA: top individuals selected
    population_size=300,    # GA: population size
    mixed_population=True,  # GA: use mixed population
    elitism=True,           # GA: keep the best individual
    activation_id=1,        # NN: TANH activation
    save_nn=True,           # save NN state
    overwrite=False,        # save overwriting previous generation
    save_path_nn="./data/", # save path
    save_interval=25,       # save every n generations
    epochs=1000,            # GA: number of training generations
    nb_batches=100,         # Reset pads every N generations if random
    fit_min=-2000,          # Fitness threshold to detect stagnation
    fit_streak=5,           # Generations below threshold to trigger pad reset
    verbose=False
)

Core simulation

The GameLogic class encapsulates the state and physics of the lander. - reset: Sets initial position, velocity, fuel, and landing status based on game_cfg. Recalculates landing pad center. - _apply_action: Modifies velocity and fuel based on the chosen action (0: Noop, 1: Up, 2: Left, 3: Right). - _update_physics: Updates velocity based on gravity and friction, then updates position. Prevents the lander from going below ground level. - _check_landing_crash: Determines if the lander has touched the ground and checks if it’s within the landing pad boundaries and below the maximum velocity thresholds (max_vx, max_vy). Sets landed, crashed, and landed_successfully flags. - update: The main step function called by the game loop. It applies the action, updates physics, checks for landing/crash, increments the time step, and returns the new state and a ‘done’ flag. - is_done: Returns True if the episode ended (landed or crashed). - get_state: Computes and returns the current state vector normalized for the neural network (velocities, distance to pad, fuel, vertical acceleration). - get_render_info: Returns a dictionary with the necessary information for drawing the game state.


# Snippet from GameLogic.update in mod_game_logic.py
def update(self, action: int):
    if self.is_done():
        return self.get_state(), True # Return current state if already done

    self._apply_action(action)
    self._update_physics()
    self._check_landing_crash()

    self.time_step += 1

    done = self.is_done()
    state = self.get_state() # Get state for NN

    return state, done

# Snippet from GameLogic.get_state in mod_game_logic.py
def get_state(self) -> np.ndarray:
    dist_target_x = self.x - self.landing_pad_center_x
    dist_target_y = self.y - self.landing_pad_y

    state = np.array([
        self.vx / gcfg.max_vx,      # Normalized Vx
        self.vy / gcfg.max_vy,      # Normalized Vy
        dist_target_x / cfg.width,  # Normalized distance X to pad center
        dist_target_y / cfg.height, # Normalized distance Y to pad top
        self.fuel / lcfg.max_fuel,  # Normalized Fuel
        self.ay / pcfg.g if pcfg.g != 0 else self.ay # Normalized vertical acceleration
    ], dtype=float)
    # Clip normalized velocities to avoid extremes
    state[0] = np.clip(state[0], -2, 2)
    state[1] = np.clip(state[1], -2, 2)
    state[2] = np.clip(state[2], -2, 2) # Clip distance X
    state[3] = np.clip(state[3], -2, 2) # Clip distance Y

    return state

Lander visual assets

The LanderVisuals class is responsible for loading and providing the lander and flame images. - __init__: Loads the lander image (lander_1.png or lander_2.png based on lander_cfg) and flame images (vertical_flames.png, horizontal_flames_l.png, horizontal_flames_r.png) using Pygame. It also pre-scales these images based on dimensions defined in lander_cfg. - get_scaled_images: Returns a dictionary containing the pre-scaled Pygame surfaces for easy access during rendering.


# Structure of LanderVisuals in mod_lander.py
class LanderVisuals:
    def __init__(self):
        # Store dimensions from lcfg
        self.width = lcfg.width
        # ... other dimensions ...

        # Load assets using pygame.image.load
        try:
            self.image = pygame.image.load('assets/png/' + lcfg.img + '.png').convert_alpha()
            self.vflames = pygame.image.load('assets/png/vertical_flames.png').convert_alpha()
            # ... load lflames, rflames ...
        except pygame.error as e:
            print(f"Error loading lander assets: {e}")

        # Pre-scale images using pygame.transform.scale
        self.lander_img_scaled = pygame.transform.scale(self.image, (self.width, self.height))
        # ... scale vflames, lflames, rflames ...

    def get_scaled_images(self):
        return {
            "lander": self.lander_img_scaled,
            "vflames": self.vflames_img_scaled,
            "lflames": self.lflames_img_scaled,
            "rflames": self.rflames_img_scaled
        }

Neural network training and interaction

The module mod_nn_train.py defines the NeuralNetwork class, which acts as a Python wrapper around the C++ neural network/GA library (cpp_nn_py). - init: Creates a new C++ ANN_MLP_GA_double object with the network structure (_nnsize derived from nn_config), population size, and other GA parameters. Creates the initial population. - load: Loads a previously saved network state (.hd5 file) using the C++ object’s Deserialize method. Retrieves the generation count from the loaded network. - save: Saves the current network state to an .hd5 file using Serialize. Creates a _last.hd5 copy for easy access to the latest model. - _calculate_terminal_penalty: Computes the fitness penalty/reward at the end of an episode based on success/failure, fuel remaining, final velocity, and steps taken. Lower scores are better. - train: Runs the genetic algorithm training loop. For each generation: - Optionally resets pad positions based on nn_config.nb_batches. - Iterates through each member of the population: - Runs a full game simulation using GameLogic. - Gets actions from the current member’s NN using feedforward. - Calculates step penalties (e.g., distance from pad). - Calculates the terminal penalty using _calculate_terminal_penalty. - Stores the final fitness score (step penalties + terminal penalty). - Sorts members by fitness (lower is better). - Updates the C++ GA object’s weights and biases using UpdateWeightsAndBiases. - Creates the next generation’s population using CreatePopulation. - Updates and prints generation statistics. - Checks for fitness stagnation and potentially resets pads. - Saves the network periodically. - get_action: Takes the current game state, feeds it to the C++ network (feedforward for member 0, assumed to be the best), and returns the action corresponding to the highest output neuron.


# Snippet from NeuralNetwork.train in mod_nn_train.py
def train(self):
    # ... Initialization checks ...
    print(f"Starting training for {cfg.epochs} generations...")
    # ... Print config details ...

    population_size = self._net.GetPopSize()

    for gen in range(self._nGen, self._nGen + cfg.epochs):
        # ... Optional pad reset logic ...

        fitness_scores = np.zeros(population_size, dtype=np.float64)
        all_steps = []

        print(f"\n--- Generation {gen + 1} ---")

        for member_id in range(population_size):
            game_sim = GameLogic(no_print=True) # Simulate game for this member
            state = game_sim.get_state()
            done = False
            accumulated_step_penalty = 0.0
            steps = 0

            while not done and steps < game_cfg.max_steps:
                inputs = np.array(state, dtype=np.float64)
                outputs = np.zeros(self._nnsize[-1], dtype=np.float64)

                # Get action from this member's NN
                self._net.feedforward(inputs, outputs, member_id, False)
                action = np.argmax(outputs)

                next_state, done = game_sim.update(action) # Update game

                # Calculate step penalties (e.g., distance)
                dist_x = abs(game_sim.x - game_sim.landing_pad_center_x)
                dist_y = abs(game_sim.y - game_sim.landing_pad_y)
                accumulated_step_penalty += (dist_x + dist_y) * 0.001

                state = next_state
                steps += 1
                if done: break

            # Calculate terminal penalty
            terminal_penalty = self._calculate_terminal_penalty(game_sim, steps)
            fitness_scores[member_id] = accumulated_step_penalty + terminal_penalty
            all_steps.append(steps)

        # Update GA based on fitness
        sorted_indices = np.argsort(fitness_scores) # Lower is better
        self._net.UpdateWeightsAndBiases(sorted_indices)
        self._net.CreatePopulation(cfg.elitism)
        self._net.UpdateEpochs(1)
        self._nGen = self._net.GetEpochs()

        # Print stats, check stagnation, save periodically
        # ...

    print("\nTraining finished.")

C++ Binding definition

I uses pybind11 to create Python bindings for the C++ ANN_MLP_GA template class (specialized for float and double). It exposes the C++ class constructor, methods for configuration (SetName, SetEpochs), state management (Serialize, Deserialize, GetEpochs), GA operations (UpdateWeightsAndBiases, CreatePopulation), and the core feedforward function to Python. This allows the Python code (mod_nn_train.py) to instantiate and interact with the C++ neural network objects.


// Snippet from ann_mlp_ga_py_interface.h
PYBIND11_MODULE(cpp_nn_py, m)
{
    // expose ANN_MLP_GA<double> to Python
    py::class_<nn::ANN_MLP_GA<double>>(m, "ANN_MLP_GA_double")
        .def(py::init<>())
        // Constructor with network size, seed, pop size, top performers, activation, elitism
        .def(py::init<std::vector<size_t>, int, size_t, size_t, size_t, bool>())
        // ... other methods like PrintNetworkInfo, SetName, SetEpochs ...
        .def("Serialize", &nn::ANN_MLP_GA<double>::Serialize)
        .def("Deserialize", &nn::ANN_MLP_GA<double>::Deserialize)
        .def("GetPopSize", &nn::ANN_MLP_GA<double>::GetPopSize)
        .def("GetEpochs", &nn::ANN_MLP_GA<double>::GetEpochs)
        .def("UpdateWeightsAndBiases", &nn::ANN_MLP_GA<double>::UpdateWeightsAndBiases)
        // Expose feedforward, handling numpy arrays
        .def("feedforward",
             [](nn::ANN_MLP_GA<double>& self, py::array_t<double> inputs, py::array_t<double> outputs, size_t memberid,
                bool singleReturn) {
                // ... get raw pointers from numpy arrays ...
                const double* pInputs = inputs.data();
                double* pOutputs      = outputs.mutable_data();
                size_t inputsSize     = static_cast<size_t>(inputs.size());
                size_t outputsSize    = static_cast<size_t>(outputs.size());
                // Call C++ method
                self.feedforward(pInputs, inputsSize, pOutputs, outputsSize, memberid, singleReturn);
             },
             py::arg("inputs"), py::arg("outputs"), py::arg("memberid"), py::arg("singleReturn"))
        // ... TrainGA, TestGA (not used in Python?), SetMixed, CreatePopulation ...
        ;
}

Complete code

The complete code is available on GitHub. The repository contains detailed instructions on how to set up the environment, compile, run the training and the tests and it is available at the following link.

Go to the top of the page