五子棋怎么玩才能玩的好（组合游戏系列3:）

厌恶的哑巴 2022-10-28 06:15:19

继上一篇完成了井字棋（N子棋）的minimax 最佳策略后，我们基于Pygame来创造一个图形游戏环境，可供人机和机器对弈，为后续模拟AlphaGo的自我强化学习算法做环境准备。OpenAI Gym 在强化学习领域是事实标准，我们最终封装成OpenAI Gym的接口。本篇所有代码都在github MyEncyclopedia/ConnectNGym。

组合游戏系列1: Leetcode中的Minimax 和 Alpha Beta剪枝（1）
组合游戏系列1: Leetcode中的Minimax 和 Alpha Beta剪枝（2）
组合游戏系列2: 井字棋Leetcode系列题解和Minimax最佳策略实现
第三篇: 井字棋、五子棋的OpenAI Gym GUI环境
第四篇: 井字棋、五子棋的蒙特卡洛树搜索（MCTS）

井字棋、五子棋 Pygame 实现

五子棋怎么玩才能玩的好（组合游戏系列3:）(1)

Pygame 井字棋玩家对弈效果

Python 上有Tkinter，PyQt等跨平台GUI类库，主要用于桌面程序编程，但此类库容量较大，编程也相对麻烦。Pygame具有代码少，开发快的优势，比较适合快速开发五子棋这类桌面小游戏。

Pygame 极简入门

与所有的GUI开发相同，Pygame也是基于事件的单线程编程模型。下面的例子包含了显示一个最简单GUI窗口，操作系统产生事件并发送到Pygame窗口，while True 控制了python主线程永远轮询事件。我们在这里仅仅判断了当前是否是关闭应用程序事件，如果是则退出进程。此外，clock 用于控制FPS。

importsys importpygame pygame.init() display=pygame.display.set_mode((800,600)) clock=pygame.time.Clock() whileTrue: foreventinpygame.event.get(): ifevent.type==pygame.QUIT: sys.exit(0) else: pygame.display.update() clock.tick(1)

PyGameBoard 主体代码

PyGameBoard类封装了Pygame实现游戏交互和显示的逻辑。上一篇中，我们完成了ConnectNGame逻辑，这里PyGameBoard需要在初始化时，指定传入ConnectNGame 实例（见下图），支持通过API 方式改变其状态，也支持GUI交互方式等待人类玩家的输入。next_user_input(self)实现了等待人类玩家输入的逻辑，本质上是循环检查GUI事件直到有合法的落子产生。

五子棋怎么玩才能玩的好（组合游戏系列3:）(2)

PyGameBoard Class Diagram

classPyGameBoard: def__init__(self,connectNGame:ConnectNGame): self.connectNGame=connectNGame pygame.init() defnext_user_input(self)->Tuple[int,int]: self.action=None whilenotself.action: self.check_event() self._render() self.clock.tick(60) returnself.action defmove(self,r:int,c:int)->int: returnself.connectNGame.move(r,c) if__name__=='__main__': connectNGame=ConnectNGame() pygameBoard=PyGameBoard(connectNGame) whilenotpygameBoard.isGameOver(): pos=pygameBoard.next_user_input() pygameBoard.move(*pos) pygame.quit()

check_event 较之极简版本增加了处理用户输入事件，这里我们仅支持人类玩家鼠标输入。方法_handle_user_input 将鼠标点击事件转换成棋盘行列值，并判断点击位置是否合法，合法则返回落子位置，类型为Tuple[int, int]，例如(0, 0)表示棋盘最左上角位置。

defcheck_event(self): foreinpygame.event.get(): ife.type==pygame.QUIT: pygame.quit() sys.exit(0) elife.type==pygame.MOUSEBUTTONDOWN: self._handle_user_input(e) def_handle_user_input(self,e:Event)->Tuple[int,int]: origin_x=self.start_x-self.edge_size origin_y=self.start_y-self.edge_size size=(self.board_size-1)*self.grid_size self.edge_size*2 pos=e.pos iforigin_x<=pos[0]<=origin_x sizeandorigin_y<=pos[1]<=origin_y size: ifnotself.connectNGame.gameOver: x=pos[0]-origin_x y=pos[1]-origin_y r=int(y//self.grid_size) c=int(x//self.grid_size) valid=self.connectNGame.checkAction(r,c) ifvalid: self.action=(r,c) returnself.action

OpenAI Gym 接口规范

OpenAI Gym规范了Agent和环境（Env）之间的互动，核心抽象接口类是gym.Env，自定义的游戏环境需要继承Env，并实现 reset、step和render方法。下面我们看一下如何具体实现ConnectNGym的这几个方法：

classConnectNGym(gym.Env): defreset(self)->ConnectNGame: """Resetsthestateoftheenvironmentandreturnsaninitialobservation. Returns: observation(object):theinitialobservation. """ raiseNotImplementedError defstep(self,action:Tuple[int,int])->Tuple[ConnectNGame,int,bool,None]: """Runonetimestepoftheenvironment'sdynamics.Whenendof episodeisreached,youareresponsibleforcalling`reset()` toresetthisenvironment'sstate. Acceptsanactionandreturnsatuple(observation,reward,done,info). Args: action(object):anactionprovidedbytheagent Returns: observation(object):agent'sobservationofthecurrentenvironment reward(float):amountofrewardreturnedafterpreviousaction done(bool):whethertheepisodehasended,inwhichcasefurtherstep()callswillreturnundefinedresults info(dict):containsauxiliarydiagnosticinformation(helpfulfordebugging,andsometimeslearning) """ raiseNotImplementedError defrender(self,mode='human'): """ Renderstheenvironment. Thesetofsupportedmodesvariesperenvironment.(Andsome environmentsdonotsupportrenderingatall.)Byconvention, ifmodeis: -human:rendertothecurrentdisplayorterminaland returnnothing.Usuallyforhumanconsumption. -rgb_array:Returnannumpy.ndarraywithshape(x,y,3), representingRGBvaluesforanx-by-ypixelimage,suitable forturningintoavideo. -ansi:Returnastring(str)orStringIO.StringIOcontaininga terminal-styletextrepresentation.Thetextcanincludenewlines andANSIescapesequences(e.g.forcolors). Note: Makesurethatyourclass'smetadata'render.modes'keyincludes thelistofsupportedmodes.It'srecommendedtocallsuper() inimplementationstousethefunctionalityofthismethod. Args: mode(str):themodetorenderwith """ raiseNotImplementedError

reset 方法

defreset(self)->ConnectNGame

重置环境状态，并返回给Agent重置后环境下观察到的状态。ConnectNGym内部维护了ConnectNGame实例作为自身状态，每个agent落子后会更新这个实例。由于棋类游戏对于玩家来说是完全信息的，我们直接返回ConnectNGame的deepcopy。

step 方法

defstep(self,action:Tuple[int,int])->Tuple[ConnectNGame,int,bool,None]

Agent 选择了某一action后，由环境来执行这个action并返回4个值：1. 执行后的环境Agent观察到的状态；2. 环境执行了这个action回馈给agent的reward；3. 环境是否结束；4. 其余信息。

step方法是最核心的接口，因此举例来说明ConnectNGym中的输入和输出：

初始状态

五子棋怎么玩才能玩的好（组合游戏系列3:）(3)

状态 ((0, 0, 0), (0, 0, 0), (0, 0, 0))

Agent A 选择action = (0, 0)，执行ConnectNGym.step 后返回值：status = ((1, 0, 0), (0, 0, 0), (0, 0, 0))，reward = 0，game_end = False

五子棋怎么玩才能玩的好（组合游戏系列3:）(4)

状态 ((1, 0, 0), (0, 0, 0), (0, 0, 0))

Agent B 选择action = (1, 1)，执行ConnectNGym.step 后返回值：status = ((1, 0, 0), (0, -1, 0), (0, 0, 0))，reward = 0，game_end = False

五子棋怎么玩才能玩的好（组合游戏系列3:）(5)

状态 ((1, 0, 0), (0, -1, 0), (0, 0, 0))

重复此过程直至游戏结束，下面是5步后游戏可能达到的最终状态

五子棋怎么玩才能玩的好（组合游戏系列3:）(6)

终结状态 ((1, 1, 1), (-1, -1, 0), (0, 0, 0))

此时step的返回值为：status = ((1, 1, 1), (-1, -1, 0), (0, 0, 0))，reward = 1，game_end = True

render 方法

defrender(self,mode='human')

展现环境，通过mode区分是否是人类玩家。

ConnectNGym 代码

classConnectNGym(gym.Env): def__init__(self,pygameBoard:PyGameBoard,isGUI=True,displaySec=2): self.pygameBoard=pygameBoard self.isGUI=isGUI self.displaySec=displaySec self.action_space=spaces.Discrete(pygameBoard.board_size*pygameBoard.board_size) self.observation_space=spaces.Discrete(pygameBoard.board_size*pygameBoard.board_size) self.seed() self.reset() defreset(self)->ConnectNGame: self.pygameBoard.connectNGame.reset() returncopy.deepcopy(self.pygameBoard.connectNGame) defstep(self,action:Tuple[int,int])->Tuple[ConnectNGame,int,bool,None]: #assertself.action_space.contains(action) r,c=action reward=REWARD_NONE result=self.pygameBoard.move(r,c) ifself.pygameBoard.isGameOver(): reward=result returncopy.deepcopy(self.pygameBoard.connectNGame),reward,notresultisNone,None defrender(self,mode='human'): ifnotself.isGUI: self.pygameBoard.connectNGame.drawText() time.sleep(self.displaySec) else: self.pygameBoard.display(sec=self.displaySec) defget_available_actions(self)->List[Tuple[int,int]]: returnself.pygameBoard.getAvailablePositions()

井字棋（N子棋）Minimax策略玩家

图中当k=3,m=n=3即井字棋游戏中，两个minimax策略玩家的对弈效果，游戏结局符合已知的结论：井字棋的解是先手被对方逼平。

五子棋怎么玩才能玩的好（组合游戏系列3:）(7)

Minimax策略AI对弈

镜像游戏状态的DP处理

上一篇中，我们确认了井字棋的总状态数是5478。当k=3, m=n=4时是6035992，k=4, m=n=4时是9722011，总的来说游戏状态数是以指数级增长的。上一版minimax DP策略还有改善的空间，第一种是旋转格局的处理。对于任意一种棋盘格局可以得到90度旋转后的另外三种格局，它们的最佳结局是一致的。因此，我们在递归过程中解得某一棋盘格局后，将其另外三种旋转后格局的解也一起缓存起来。例如：

五子棋怎么玩才能玩的好（组合游戏系列3:）(8)

某游戏状态

五子棋怎么玩才能玩的好（组合游戏系列3:）(9)

旋转后的三种游戏状态

defsimilarStatus(self,status:Tuple[Tuple[int,...]])->List[Tuple[Tuple[int,...]]]: ret=[] rotatedS=status for_inrange(4): rotatedS=self.rotate(rotatedS) ret.append(rotatedS) returnret defrotate(self,status:Tuple[Tuple[int,...]])->Tuple[Tuple[int,...]]: N=len(status) board=[[ConnectNGame.AVAILABLE]*Nfor_inrange(N)] forrinrange(N): forcinrange(N): board[c][N-1-r]=status[r][c] returntuple([tuple(board[i])foriinrange(N)])

Minimax 策略预计算

之前我们对每个棋局去计算最佳的下一步，并在此过程中做了剪枝，即当已经找到当前玩家必胜落子时直接返回。这对于单一局面的计算是较优的，但是AI Agent 需要在每一步都重复这个过程，当棋盘大小>3时运算非常耗时，因此我们来做第二种优化。初始空棋盘时使用Minimax来保证遍历所有状态，缓存所有棋局的最佳结果。对于AI Agent面临的每个棋局只需查找此棋局下所有的可能落子位置，并返回最佳决定，这样大大减少了每次棋局下重复的minimax递归计算。相关代码如下。

classPlannedMinimaxStrategy(Strategy): def__init__(self,game:ConnectNGame): super().__init__() self.game=copy.deepcopy(game) self.dpMap={}#game_status=>result,move self.result=self.minimax(game.getStatus()) defaction(self,game:ConnectNGame)->Tuple[int,Tuple[int,int]]: game=copy.deepcopy(game) player=game.currentPlayer bestResult=player*-1#assumeopponentwinasworstresult bestMove=None formoveingame.getAvailablePositions(): game.move(*move) status=game.getStatus() game.undo() result=self.dpMap[status] ifplayer==ConnectNGame.PLAYER_A: bestResult=max(bestResult,result) else: bestResult=min(bestResult,result) #updatebestMoveifanyimprovement bestMove=moveifbestResult==resultelsebestMove print(f'move{move}=>{result}') returnbestResult,bestMove

Agent 类和对弈逻辑

Agent 类的抽象并不是 OpenAI Gym的规范，出于代码扩展性，我们也封装了Agent基类及其子类，包括AI玩家和人类玩家。BaseAgent需要子类实现 act方法，默认实现为随机决定。

classBaseAgent(object): def__init__(self): pass defact(self,game:PyGameBoard,available_actions): returnrandom.choice(available_actions)

AIAgent 实现act并代理给 strategy 的action方法。

classAIAgent(BaseAgent): def__init__(self,strategy:Strategy): self.strategy=strategy defact(self,game:PyGameBoard,available_actions): result,move=self.strategy.action(game.connectNGame) assertmoveinavailable_actions returnmove

HumanAgent 实现act并代理给 PyGameBoard 的next_user_input方法。

classHumanAgent(BaseAgent): def__init__(self): pass defact(self,game:PyGameBoard,available_actions): returngame.next_user_input()

五子棋怎么玩才能玩的好（组合游戏系列3:）(10)

下面代码展示如何将Agent，ConnectNGym，PyGameBoard 等所有上述类串联起来，完成人人对弈，人机对弈。

defplay_ai_vs_ai(env:ConnectNGym): plannedMinimaxAgent=AIAgent(PlannedMinimaxStrategy(env.pygameBoard.connectNGame)) play(env,plannedMinimaxAgent,plannedMinimaxAgent) defplay(env:ConnectNGym,agent1:BaseAgent,agent2:BaseAgent): agents=[agent1,agent2] whileTrue: env.reset() done=False agent_id=-1 whilenotdone: agent_id=(agent_id 1)%2 available_actions=env.get_available_actions() agent=agents[agent_id] action=agent.act(pygameBoard,available_actions) _,reward,done,info=env.step(action) env.render(True) ifdone: print(f'result={reward}') time.sleep(3) break if__name__=='__main__': pygameBoard=PyGameBoard(connectNGame=ConnectNGame(board_size=3,N=3)) env=ConnectNGym(pygameBoard) env.render(True) play_ai_vs_ai(env)

五子棋怎么玩才能玩的好（组合游戏系列3:）(11)

Class Diagram 总览

频道定期发文AI，算法，计算机技术，喜欢文章请给我点支持：关注，点赞，转发。中英文版本首发于 MyEncyclopedia 微信公众号，也欢迎大家关注。此文最佳阅读姿势请点击下方了解更多并在电脑版浏览器中打开。相关代码均在github MyEncyclopedia中。

著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

展开全文

免责声明：本文仅代表文章作者的个人观点，与本站无关。其原创性、真实性以及文中陈述文字和内容未经本站证实，对本文以及其中全部或者部分内容文字的真实性、完整性和原创性本站不作任何保证或承诺，请读者仅作参考，并自行核实相关内容。文章投诉邮箱：anhduc.ph@yahoo.com

秒懂生活

五子棋怎么玩才能玩的好（组合游戏系列3:）

井字棋、五子棋 Pygame 实现

Pygame 极简入门

PyGameBoard 主体代码

OpenAI Gym 接口规范

reset 方法

step 方法

render 方法

ConnectNGym 代码

井字棋（N子棋）Minimax策略玩家

镜像游戏状态的DP处理

Minimax 策略预计算

Agent 类和对弈逻辑

猜您喜欢

网易唯一一款良心游戏（网易凭借三款精品游戏）

网易梦幻西游手游好吗（网易经典游戏梦幻西游手游版）

csgo官方公布的开箱概率（CSGO国服公布游戏内开箱概率）

王者荣耀s25赛季中路最强势的英雄（王者荣耀S25孙悟空攻略）

王者荣耀怎样改荣耀战区的（王者荣耀荣耀战区怎么改）

steam不得不玩的联机游戏（Steam上的欢乐向联机游戏盘点）

steam最贵的游戏是哪个（Steam最贵的十款游戏）

热门推荐

排行榜